Logical Inferences with Contexts of RDF Triples Vinh Nguyen Amit Sheth Kno.e.sisCenter Kno.e.sisCenter WrightStateUniversity WrightStateUniversity Dayton,Ohio,USA Dayton,Ohio,USA [email protected] [email protected] ABSTRACT knowledge base, as a set of RDF triples, can be created us- 7 Logical inference, an integral feature of the Semantic Web, ingdifferentmethods. Existingdatafromastructuredform 1 istheprocessofderivingnewtriplesbyapplyingentailment (e.g., relational databases, text files, XML documents, or 0 HTML pages) can be transformed into RDF with ontolo- rules on knowledge bases. The entailment rules are deter- 2 giesdescribingthedatabaseschema(e.g.,Bio2RDF[8]and minedbythemodel-theoreticsemantics. Incorporatingcon- PubChem [12]). A knowledge base can also be created by n textofanRDFtriple(e.g.,provenance,time,andlocation) extractingthetriplesintheformof(subject,predicate,ob- a intotheinferencingprocessrequirestheformalsemanticsto J be capable of describing the context of RDF triples also in ject)fromunstructureddatausingnaturallanguageprocess- ing algorithms (e.g., Google Knowledge Vault [11], Yago2S 0 theformoftriples,orinotherwords,RDFcontextualtriples [17], andDBpedia[20]). Ineithermethod, eachRDFtriple 2 about triples. The formal semantics should also provide the in the resulting knowledge bases can be associated and en- rules that could entail new contextual triples about triples. riched with different types of contextual information, such ] Inthispaper,weproposethefirstinferencingmechanism I asthetimedurationinwhichthetripleholdstrue, andthe A that allows context of RDF triples, represented in the form provenance specifying the source of the triple. of RDF triples about triples, to be the first-class citizens in . If a knowledge base is created and maintained by an or- s the model-theoretic semantics and in the logical rules. Our c inferencemechanismiswell-formalizedwithallnewconcepts ganization, the knowledge integrity can be validated within [ thatorganization. Nowadays,itiscommonplaceforaknowl- being captured in the model-theoretic semantics. This for- malsemanticsalsoallowsustoderiveanewsetofentailment edge base to be created and shared by anyone on the Web, 1 orintheLinkedOpenData. Therefore,webelievethatthe v rules that could entail new contextual triples about triples. contextofeveryfactorassertionshouldbeprovidedforcon- 4 To demonstrate the feasibility and the scalability of the sumerstovalidateandassessthereliabilityoftheknowledge 2 proposedmechanism,weimplementanewtoolinwhichwe before using it. The contextual information of a triple may 7 transformtheexistingknowledgebasestoourrepresentation providethetimeintervalorthetimeinstantwhenthetriple 5 ofRDFtriplesabouttriplesandprovidetheoptionforthis holdstruesothattheconsumerscanvalidateit. Itmayalso 0 tool to compute the inferred triples for the proposed rules. providetheWebpageorthearticlewherethetriplewasex- . We evaluate the computation of the proposed rules on a 1 tractedfrom,oranyprovenanceinformationthatallowsfor large scale using various real-world knowledge bases such 0 tracking the origin of the triples so that the consumers can as Bio2RDF NCBI Genes and DBpedia. The results show 7 assess the reliability of the triples. Since a fact would not that the computation of the inferred triples can be highly 1 hold true in every context, the context in which a proposi- scalable. On average, one billion inferred triples adds 5-6 : v minutestotheoveralltransformationprocess. NCBIGenes, tionholdstrueneedstobepresentedintheknowledgebases i with20billiontriplesintotal,tookonly232minutesforthe so that the proposition can be validated and reused. X Popular knowledge bases currently represent the contex- transformationof12billiontriplesandadded42minutesfor r tualinformationoftheirtriplesintheRDFquadform. For inferring 8 billion triples to the overall process. a example, DBpedia [6], CTD [1], and GO Annotations [2], NCBI Genes [4], and PharmGKB [5] have the provenance 1. INTRODUCTION ofeveryRDFtriplerepresentedinthequadform. Sincethe Semantic Web technologies such as RDF and OWL are fourth element of the RDF quad is not formalized as first- emergingasstandardlanguagesformachine-understandable class citizen of the current model-theoretic semantics [15], knowledge representation and reasoning. A Semantic Web itcannotberepresentedintheentailmentrules. Therefore, to support the inferences involving contextual information about RDF triples, we believe that the current RDF and OWL semantics should be expanded to allow for the acco- modation of contexts of triples as first-class citizens in the This work is licensed under the Creative Commons Attribution- model-theoretic semantics and entailment rules. NonCommercial-NoDerivatives4.0InternationalLicense. Toviewacopy 1.1 MotivatingExample ofthislicense,visithttp://creativecommons.org/licenses/by-nc-nd/4.0/.For anyusebeyondthosecoveredbythislicense,obtainpermissionbyemailing Givenastatement“BarackObamaismarriedtoMichelle [email protected]. Obama” (T ), and a sub-property relationship between is- ProceedingsoftheVLDBEndowment,Vol.10,No.4 1 MarriedToandisSpouseOf (T ),applyingtherulerdfs7[14] Copyright2016VLDBEndowment2150-8097/16/12. 2 Derivingstatements T : BarackObama isMarriedTo MichelleObama . 1 S1. BarackObamaismarriedtoMichelleObamainIllinois. T4: T1 happenedIn Chicago . S2. BarackObamaismarriedtoMichelleObamainUSA. T5: Chicago partOf Illinois . S3. BarackObamabecamespouseofMichelleObamainChicago. T6: Illinois partOf USA . S4. BarackObamabecamespouseofMichelleObamainIllinois. We call T a primary triple and T a meta triple about S5. BarackObamabecamespouseofMichelleObamainUSA. location. T1he primary triple T is a4 regular RDF triple, 1 while the meta triple T describes the context in which the Table 1: Statements inferred from “BarackObama 4 primary triple holds true. is married to MichelleObama in Chicago” Considering the original statement represented in T and 1 T , there is no relationship between the triple “Barack- 4 Obama isMarriedTo MichelleObama” and its identifier T . 1 Therefore, there is no relationship between the primary forrdfs:subPropertyOfonthetwotripleswillentailthenew triple T and the meta triple T . triple“BarackObamaisaspouseofMichelleObama”(T ). 1 4 3 Tobridgethisgapandfulfilltherequirementofrepresent- T1: BarackObama isMarriedTo MichelleObama . ing contextual statements in machine-understandable form T2: isMarriedTo subPropertyOf isSpouseOf . asdiscussedearlier,severalapproachessuchasnamedgraph T3: BarackObama isSpouseOf MichelleObama . [9], RDF reification [14], and singleton property [22] can be Thesetripleswouldprovideenoughknowledgeforanswer- used for representing the relationship between a triple and ing simple questions. For example, who is Barack Obama itsidentifier. Furthermore,thesemanticsoftherelationship married to? Or who is spouse of Barack Obama? betweenatripleanditsidentifiermustbecapturedandex- However,thesetriplesdonotprovidesufficientknowledge pressed as a first-class citizen in the formal semantics. It to give answers to more complex questions. For example, allows the semantics of the contextual statements to be ex- whenandwheredidBarackObamamarryMichelleObama? pressed in the formal model as well as in the logical rules. Was he married to Michelle Obama before 2008? Was he a Among the existing approaches, the singleton property spouse of Michelle Obama in Harvard, Massachusetts? representation comes with a formal semantics formalizing To answer these questions, contextual information such the relationship between the singleton property and the as time and location need to be represented, as in “Barack triple it represents. Meanwhile, although the named graph Obama is married to Michelle Obama in Chicago in 1992”. couldbeusedasatripleidentifier,itwasnotintendedtobe Given this contextual statement and the knowledge that used for that purpose. Instead, the named graph is mainly ChicagoisacityinIllinois,andIllinoisisastateintheUSA, used for representing a set of triples. The RDF reification weashumanscanquicklyinfermanycontextualstatements does not have a formal semantics for capturing the rela- which could answer the above questions. Furthermore, we tionshipbetweenastatementinstanceandthetripleitrep- canalsoinferstatementsthatcouldbemuchdifferentfrom resents. As a result, we choose the singleton property (SP) the original statement and from each other. For example, representationanduseitssemanticsinourentailmentmech- the statements S and S in Table 1 only share the subject anism. Correspondingly, all the knowledge bases available 1 5 and the object while their relationships and contextual in- in the form of RDF reification or named graph need to be formationaretotallydifferent,asshowninTable1withthe transformed into the singleton property representation for inferred information in bold font. inference purposes. How does a machine become intelligent enough to infer This paper has two contributions: contextualstatementsashumansdo? Weaddressthisques- • We developed a new entailment mechanism that al- tionby(1)developinganinferencingmechanismthatcould lowscontextualstatements,representedintheformof entail new contextual statements and (2) demonstrating an RDF contextual triples about triples, to be the first- inferencing process that derives the human-inferred contex- class citizens in model-theoretic semantics as well as tual statements from the original statement. in the logical rules. The proposed mechanism is well- In order for machines to infer contextual statements, formalizedwithallnewconceptsforRDFtriplesabout we believe that two requirements need to be fulfilled. triples(Section2)beingcapturedbyamodel-theoretic First, these contextual statements must be represented in semantics (Section 3). The formal semantics also al- machine-understandable form, with explicitly defined se- lows us to derive a new set of rules that could entail mantics for the relationship between the triple and its con- new contextual statements (Section 4.1). We demon- textualinformation. Second,weneedanentailmentmecha- strate how the human-inferred contextual statements nismthattakesintoaccountthesemanticsofthecontextual in Table 1 can be entailed using the proposed rules triples about triples so that it can entail the new ones. (Section 4.2). To the best of our knowledge, there is no RDF/RDFS inferencing rule that involves RDF contextual triples about • Wedemonstratedthattheproposedentailmentmech- triples. Our paper addresses this missing capability. anismisscalable. Wedevelopedanewtool,calledrdf- contextualizer, to transform existing knowledge bases from the named graph form to the singleton property 1.2 Approach representation. We then provide the option to com- The contextual statement “Barack Obama is married to puteallinferredtriplesusingtheproposedcontextual Michelle Obama in Chicago” is used as an illustrative ex- entailment rules in this tool (Section 5). We evalu- ample throughout the paper. We represent this contextual atedthecomputationalperformanceinverylargereal- statementandthebackgroundknowledgeofChicagointhe world knowledge bases with billions of triples such as form of triples as follows: DBpedia and NCBI Genes (Section 6). This paper presents the foundational capability. While of pairs while the singleton property is mapped to a single thelocationcontextandsimplequestionansweringareused pair. If a singleton property also plays the role of a generic as the illustrative examples, discussions of a broad class of property, it will occur as the predicate in multiple triples. SemanticWebapplicationssuchasqueryingSPARQLwith This contradicts to the definition of the singleton property. backward-chaining reasoning, streaming reasoning, tempo- Therefore,asingletonpropertyshouldneverbedefinedasa ralreasoning,trackingcontextofinferredtriples,andques- generic property. In other words, the set of generic proper- tion answering based on extracted knowledge and logical ties and the set of singleton properties are disjoint. The inferences, are beyond the scope of the paper. two classes SingletonProperty and GenericProperty do The remaining sections are as follows. We present the not share common instances. related work in Section 7. We discuss the future work in EverygenericpropertyisanRDFproperty,butnotevery Section 8 and conclude with Section 9. RDFpropertyisagenericpropertyas,forexample,itcould beasingleton. ThatmakesGenericPropertyasub-classof 2. CONCEPTUALMODEL Property. rdf:GenericProperty rdfs:subClassOf rdf:Property . 2.1 Preliminaries rdf:SingletonProperty rdfs:subClassOf rdf:Property . Every generic property is an instance of the Property Herewerecallthesingletonpropertyconceptwithitssyn- class. However, one Property instance may not belong to tax and semantics from [22]. anyoftheclassesSingletonPropertyorGenericProperty. A singleton property is a specific property instance that We call this a regular property. In the example at hand, represents a unique relationship under a specific context. partOf is a regular property. For the example at hand, the singleton property isMar- Althougharegularpropertyinsomecasesmaysharethe riedTo#1 uniquely represents the isMarriedTo relationship same property extension with a generic property, it is nec- betweenBarackObamaandMichelleObama. Thissingleton essary to make the clear distinction between them. property can be asserted with contextual information “in Distinguishing property types. Here we distinguish Chicago” about the relationship as follows: three types of properties: singleton, generic, and regular. SP1: BarackObama isMarriedTo#1 MichelleObama. We also call them context-associated, context-dissociated, SP2: isMarriedTo#1 singletonPropertyOf isMarriedTo. and context-agnostic property, respectively. A singleton SP3: isMarriedTo#1 happenedIn Chicago. propertysuchasisMarriedTo#1iscalledcontext-associated Formalsemantics. Herewerecallthemappingfunction because it can be asserted with contextual information. I from the current model-theoretic semantics [15]. EXT A generic property, or context-dissociated property such A property mapping function I is a binary relation EXT as isMarriedTo, does not have contextual information at- that maps one property to a set of pairs of resources. tached. Finally, a regular property such as happenedIn, in- Formally, let IP be the set of properties, IR be the set of stance of Property class, is called a context-agnostic prop- resources. Then I : IP → 2IR×IR. EXT erty because it is not yet committed to be associated or AsingletonmappingfunctionI isabinaryrelation SEXT dissociated with contextual information. that maps one singleton property to one pair of resources. Formally, let IR be the set of resources, IPs ⊆ IP be the 2.3 TripleTypes set of singleton properties. Then I : IPs → IR × IR. SEXT Similar to the distinction of the property types, here we For example, I (isMarriedTo#1) = (cid:104)BarackObama, SEXT canalsodistinguishthetypeofatriplebasedonthetypeof MichelleObama(cid:105). AsthesingletonpropertyisMarriedTo#1 itsproperty. Thethreetypesofpropertiesformthreetypes is also a property, its property extension is a singleton set, of triples: singleton, generic, and regular triple. A single- which has only one element. ton triple is the only triple that has its singleton property I (isMarriedTo#1) = (cid:104)BarackObama, MichelleObama(cid:105). EXT occurring as a predicate. A generic triple is a triple that The syntax and the semantics of singleton properties de- has its predicate asserted as a generic property. A regular scribed here are sufficient to allow us to attach contextual tripleisatriplethatdoesnothaveitspredicateassertedas information for any RDF individual triple. a generic or singleton property. Next,wedescribehowsingletonpropertiescanbeutilized Singleton: BarackObama isMarriedTo#1 MichelleObama. for developing our inference scheme to infer new triples. Generic: BarackObama isMarriedTo MichelleObama. Regular: Chicago partOf Illinois. 2.2 PropertyTypes Distinguishing triple types based on context. The A generic property asserts the relationship between the threetypesofpropertiesformthreetypesoftriples: context- subject and the object without providing additional con- associated, context-dissociated, andcontext-agnostictriple. textual information about the relationship. This property A context-associated triple is a singleton triple that may groupsallsingletonpropertiessharingthesamecharacteris- have contextual information associated with it via its sin- ticsacrosscontexts,anditisconnectedtoasingletonprop- gleton property. A context-dissociated triple is a generic erty via the property singletonPropertyOf. triple that does not have any contextual information asso- SP2: isMarriedTo#1 singletonPropertyOf isMarriedTo. ciated with it. A context-agnostic triple is a regular triple In this example, isMarriedTo is a generic property. that is not committed to be associated or dissociated with Here we propose to add the new class GenericProperty contextual information. to represent the set of generic properties, in addition to 2.4 Contextualtripleinstantiation the class SingletonProperty from [22]. Any property that is not defined as a singleton property can become generic Back to the motivating example, we have the original property. Intuitively, a generic property is mapped to a set statement “Barack Obama married to Michelle Obama in Chicago” represented as follows: Next, we describe model-theoretic semantics using these T : BarackObama isMarriedTo MichelleObama . mapping functions. 1 T : T happenedIn Chicago . For RDF and RDFS, the entailments are determined by 4 1 themodel-theoreticsemantics. ForanentailmentrelationX Weutilizethesingletonpropertygraphpatterntobridge (e.g.,RDFSentailment),anRDFgraphGissaidtoX-entail the gap between the primary triple T and its identifier as 1 an RDF graph G(cid:48) if each X-interpretation that satisfies G explained in Section 2.1. also satisfies G(cid:48). This definition is completed with a math- SP1: BarackObama isMarriedTo#1 MichelleObama. ematical definition of X-interpretation. SP2: isMarriedTo#1 singletonPropertyOf isMarriedTo. Wespecifythreeinterpretations: simple,RDFandRDFS SP3: isMarriedTo#1 happenedIn Chicago. byextendingthemodel-theoreticsemanticsdescribedin[16, This singleton property graph pattern represents the pri- 22]. For each interpretation, we add additional criteria for marytripleT anditsmetatripleT bycreatingasingleton 1 4 supportingthesingletonpropertyandthegenericproperty. property isMarriedTo#1 and using it as triple identifier for While we explain the new vocabulary elements in detail, asserting meta triples. elements without further explanation remain as they are in From triple SP , isMarriedTo#1 is the singleton prop- 2 theoriginalmodel-theoreticsemanticsdescribedin[16,22]. erty and isMarriedTo is a generic property. According to the triple type classification in Section 2.3, T is a generic 3.2 SimpleInterpretation 1 triple and SP is a singleton triple. SP can be considered 1 1 Inthesimpleinterpretation,weformalizethethreetypes as one contextual triple instance of T . We generalize this 1 of properties as described in Section 2. We also use the relationship between the two triples as follows. mapping functions described in Section 3.1 to assign each Definition. Letspbethesingletonpropertyofproperty property to a semantic construct. p and (s, sp, o) be the singleton triple, (s, sp, o) is the GivenavocabularyV,thesimpleinterpretationIconsists contextual triple instance of the generic triple (s, p, o). of: How a generic triple (s, p, o) can be derived from the singletongraphpatternwillbeformalizedintheRDFformal 1. IR, a non-empty set of resources, alternatively called semantics described in Section 3.3. domain or universe of discourse of I, 2. IP, the set of properties of I, 3. MODEL-THEORETICSEMANTICS 3. IPs, called the set of singleton properties of I, as a subset of IP, 3.1 MappingFunctions Intheformalsemantics,weusetheconceptoffunctionto 4. IPg,calledthesetofgenericpropertiesofI,asasubset assign the set of URIs and literals into the set of resources of IP, IPs ∩ IPg = ∅, inamodelinterpretation,andassigneachpropertyasetof 5. IPr, called the set of regular properties, pairs (subject, object). IPr = IP \ (IPs ∪ IPg), Next, we define new mapping functions to be used in the interpretationsdescribedlaterinthissection. Wereusethe 6. I , a function assigning a generic property to a set of G set of singleton properties IPs, the set of properties IP, the its singleton properties, mappingfunctionI ,andthesingletonmappingfunction EXT ISEXT mentioned in Section 2.1. 7. IEXT, a mapping function assigning to each property Definition. AgenericpropertyinstancefunctionI isa a set of pairs from IR, G binary relation that maps a generic property to a set of its I : IP →2IR×IR where I (p) is called the ex- EXT EXT singleton properties. tension of property p, Formally, let IPg ⊆ IP be the set of generic properties. We define the function 8. ISEXT(ps), the singleton mapping function assigning I : IPg →2IPs such that a singleton property to a pair of resources. G IG(pg)={ps|(cid:104)ps,pg(cid:105)∈IEXT(rdf:singletonPropertyOf)}. ISEXT : IPs →IR×IR. Forexample,isMarriedToisagenericpropertyandithas two singleton properties as follows: 9. IGEXT(pg), the generic mapping function assigning each generic property a set of pairs of resources. isMarriedTo#1 rdf:singletonPropertyOf isMarriedTo . isMarriedTo#2 rdf:singletonPropertyOf isMarriedTo . IGEXT : IPg →2IR×IR. IG(isMarriedTo) = {isMarriedTo#1, isMarriedTo#2}. Note that the mapping function ISEXT is not a one-to- Definition. A generic mapping function IGEXT is a bi- onemapping;multiplesingletonpropertiesmaybemapped nary relation that maps a generic property to a set of pairs to the same pair of entities. of resources. I : IPg → 2IR×IR such that 3.3 RDFInterpretation GEXT IGEXT(pg)={ISEXT (ps)|ps ∈ IG(pg)}. In the RDF interpretation, we formalize the definition of Since IPg ⊆ IP, we have IGEXT(pg)⊆IEXT(pg). singleton property, generic property, and how to derive the We also have IEXT(ps) = {(cid:104)s,o(cid:105)|(cid:104)s,o(cid:105)=ISEXT(ps)}. generic triple from a singleton graph pattern. IEXT(pg) = {(cid:104)s,o(cid:105)|(cid:104)s,o(cid:105)∈IGEXT(pg)}. RDF interpretation of a vocabulary V is a simple in- In the example at hand, we have: terpretation I of the vocabulary V ∪ V that satisfies RDF I (isMarriedTo#1)={(cid:104)BarackObama,MichelleObama(cid:105)}, the criteria from the current RDF interpretation [15] and EXT IEXT(isMarriedTo)={(cid:104)BarackObama,MichelleObama(cid:105)}. the following criteria: 1. Define a singleton property 2. Class rdf:GenericProperty xs ∈ IPs iff (cid:104)rdf:GenericPropertyI,rdfs:ClassI(cid:105)∈I (rdf:typeI). (cid:104)x , rdf:SingletonProperty I(cid:105)∈I (rdf:type I(cid:105). EXT s EXT The extension of the rdf:GenericProperty class is the 2. Singleton condition set IPg of all generic properties, or Ifxs ∈IPsthen∃!(cid:104)u,v(cid:105):(cid:104)u,v(cid:105)=ISEXT(xs),andu,v IPg =ICEXT(rdf:GenericProperty I). ∈IR.Thisenforcesthesingleton-nessfortheproperty instances. 3. Every singleton property is a resource (cid:104)rdf:SingletonProperty I, rdfs:Resource I(cid:105)∈ 3. Define a generic property I (rdfs:subClassOf I), this causes IPs ⊆ IR. x ∈ IPg iff EXT g (cid:104)xg, rdf:GenericProperty I(cid:105)∈IEXT (rdf:type I). 4. Every generic property is a resource Ifx∈/IPs,thenIPg=IPg∪{x}. Anypropertythatis (cid:104)rdf:GenericProperty I, rdfs:Resource I(cid:105) notdefinedasasingletonpropertycanbecomegeneric ∈I (rdfs:subClassOf I), this causes IPg ⊆ IR. property. EXT 5. Domain of singleton property (rule rdfs-sp-1) 4. Infersingletonpropertyandgenericproperty(rulerdf- sp-1 and rdf-sp-2) (cid:104)xs,x(cid:105) ∈ IEXT(rdf:singletonPropertyOf I), (cid:104)x,y(cid:105) ∈ I (rdfs:domainI),if(cid:104)u,v(cid:105)∈I (x ),thenu∈ x ∈ IPs and x ∈ IPg if EXT SEXT s s g (cid:104)x , x (cid:105) ∈I (rdf:singletonPropertyOf I). ICEXT(y). A singleton property shares the domain s g EXT with its generic property. Asingletonpropertyx isconnectedtoagenericprop- s erty x via the property rdf:singletonPropertyOf. As 6. Range of singleton property (rule rdfs-sp-2) g IG(xg) is the set of singleton properties connected to (cid:104)x ,x(cid:105) ∈ I (rdf:singletonPropertyOf I), (cid:104)x,y(cid:105) ∈ s EXT the property xg, I (rdfs:range I), if (cid:104)u,v(cid:105) = I (x ), then v ∈ EXT SEXT s IG(xg)={xs| ICEXT(y). Asingletonpropertyalsosharestherange (cid:104)xs,xg(cid:105)∈IEXT(rdf:singletonPropertyOfI)}. with its generic property. 5. Generic mapping extension 7. Sub-property condition If (cid:104)xs, xg(cid:105) ∈ IEXT(rdf:singletonPropertyOf I), then If x,y ∈ IPg, (cid:104)x, y(cid:105) ∈ IEXT(rdfs:subPropertyOf I), xs ∈ IPs, xg ∈ IPg, and ISEXT(xs) ∈ IGEXT(xg). then IG(x)⊆IG(y), and IGEXT(x)⊆IGEXT(y). I (x ) is called a generic mapping extension of GEXT g the generic property x . 8. Sub-property upper bound condition g IGEXT(xg)={ISEXT (xs)|xs ∈ IG(xg)}, and If y ∈ IPs and (cid:104)x, y(cid:105) ∈ IEXT(rdfs:subPropertyOf I), then x∈ IPs and I (x)=I (y). I (x )⊆I (x ). SEXT SEXT GEXT g EXT g Proof. Since y ∈ IPs, ∃!(cid:104)u,v(cid:105) : (cid:104)u,v(cid:105) = I (y), SEXT 6. Generic triple derivation (rule rdf-sp-3) and I (y)={(cid:104)u,v(cid:105)}. EXT If (cid:104)u,v(cid:105)=I (x ), and SEXT s Since (cid:104)x, y(cid:105)∈I (rdfs:subPropertyOf I), by defini- (cid:104)x , x (cid:105)∈I (rdf:singletonPropertyOf I), EXT s g EXT tion, I (x)⊆I (y)={(cid:104)u,v(cid:105)}. then (cid:104)u,v(cid:105)∈I (x ). EXT EXT GEXT g Sincexcanbemappedtoatmostonepairofresources, Proof. (cid:104)x , x (cid:105) ∈ I (rdf:singletonPropertyOf I) s g EXT x∈ IPs, and I (x)=(cid:104)u,v(cid:105)=I (y). implies (1): I (x )∈I (x ). SEXT SEXT SEXT s GEXT g The combination of (cid:104)u,v(cid:105) = I (x ) and (1) im- 9. Sub-property lower bound condition (rule rdfs-sp-4) SEXT s plies (cid:104)u,v(cid:105)∈IGEXT(xg). If x∈ IPg and (cid:104)x, y(cid:105)∈IEXT(rdfs:subPropertyOf I), Thisshowshowthegenerictriple(cid:104)u,v(cid:105)∈I (x ) then y∈ IPg. GEXT g can be derived from its singleton graph pattern. Proof. Assume that y ∈ IPs, then x ∈ IPs (upper 3.4 RDFSInterpretation boundcondition). Wehavebothx∈IPsandx∈IPg. This contradicts to the condition IPs ∩ IPg = ∅. Here we formalize the connections from a singleton prop- Therefore,y∈/ IPs,andaccordingtothecondition(3) erty to its generic property, as well as to other properties of RDF interpretation, IPg = IPg ∪ {y}, or y∈ IPg. such as rdfs:domain, rdfs:range, and rdfs:subPropertyOf. We will reuse the function (from [15]) 10. Property hierarchy (rule rdfs-sp-3) I :IR→2IR whereI (y)iscalledaclassextension CEXT CEXT If (cid:104)x , x(cid:105)∈I (rdf:singletonPropertyOf I), and of y, I (y)={x|∀x∈IR:(cid:104)x,y(cid:105)∈I (rdf:typeI)}. s EXT CEXT EXT (cid:104)x, y(cid:105) ∈ I (rdfs:subPropertyOf I), then (cid:104)x , y(cid:105) ∈ RDFS interpretationofavocabularyVisanRDFin- EXT s I (rdf:singletonPropertyOf I). terpretation I of the vocabulary V ∪ V that satisfies EXT RDFS criteria from the current RDFS interpretation [15] and the Proof. (cid:104)xs, x(cid:105) ∈ IEXT(rdf:singletonPropertyOf I) following criteria: implies (1): xs ∈IG(x). (cid:104)x, y(cid:105)∈I (rdfs:subPropertyOf I) implies 1. Class rdf:SingletonProperty EXT (2): I (x)⊆I (y). (cid:104)rdf:SingletonPropertyI,rdfs:ClassI(cid:105)∈I (rdf:typeI). G G EXT (1) and (2) derive (3): x ∈I (y). Theextensionofrdf:SingletonPropertyclassistheset s G IPs of all singleton properties, or Inotherwords,xs isasingletonpropertyofy,or(cid:104)xs, IPs =ICEXT(rdf:SingletonProperty I). y(cid:105)∈IEXT(rdf:singletonPropertyOf I). 3.5 OWL2RDF-basedSemanticConditions A property p is an instance of the class From the RDF-based semantics of OWL 2 Full [23], we owl:IrreflexiveProperty iff ∀x: consider the semantic conditions of the OWL classes and (1) p∈IP,(cid:104)x,x(cid:105)∈/ I (p), EXT properties that are relevant to singleton properties. These (2) p∈IPg,(cid:104)x,x(cid:105)∈/ I (p), GEXT semantic conditions belong to the two categories: logical (3) ∀p ∈I (p),(cid:104)x,x(cid:105)(cid:54)=I (p ). characteristicsoftheproperties,andrelationstootherprop- s G SEXT s erties. We tighten the semantic conditions of these OWL • Symmetric property. If two individuals are re- classes and properties to make sure they are valid in the latedbyasymmetricproperty,thenthispropertyalso extended semantics, by enforcing more constraints on the relates them reversely. genericpropertyandsingletonpropertyextensions. These- A property p is an instance of the class manticconditionsofthesepropertiesmustbesatisfiedinthe owl:SymmetricProperty iff ∀x,y: interpretationsextendedwithsingletonpropertysemantics. Let Vp be the vocabulary of OWL classes and properties (1) p∈IP,(cid:104)x,y(cid:105)∈IEXT(p) implies (cid:104)y,x(cid:105)∈IEXT(p), relevanttosingletonproperties: Vp ={FunctionalProperty, (2) p ∈ IPg,(cid:104)x,y(cid:105) ∈ IGEXT(p) implies (cid:104)y,x(cid:105) ∈ InverseFunctionalProperty, ReflexiveProperty, Irreflexive- I (p), GEXT Property, SymmetricProperty, AsymmetricProperty, Tran- (3) ∀p ∈ I (p),(cid:104)x,y(cid:105) = I (p ) implies ∃p(cid:48) ∈ sitiveProperty, inverseOf, equivalentOf}. s G SEXT s s I (p),(cid:104)y,x(cid:105)=I (p(cid:48)). Letp ,p(cid:48),andp(cid:48)(cid:48)bethesingletonpropertiesofthegeneric G SEXT s s s s property p, then ps ∈ IG(p), p(cid:48)s ∈ IG(p), p(cid:48)s(cid:48) ∈ IG(p). We • Asymmetric property. If two individuals are re- define the OWL 2 RDF-based interpretation as follows. lated by an asymmetric property, then this property OWL 2 RDF-based interpretationofavocabularyV never relates them reversely. is an RDFS interpretation I of the vocabulary V ∪ V OWL A property p is an instance of the class that satisfies criteria from the OWL interpretation [23] and owl:AsymmetricProperty iff ∀x,y: the following semantic conditions: (1) p∈IP,(cid:104)x,y(cid:105)∈I (p) implies (cid:104)y,x(cid:105)∈/ I (p), EXT EXT • Functional property. If a property is functional, (2) p ∈ IPg,(cid:104)x,y(cid:105) ∈ I (p) implies (cid:104)y,x(cid:105) ∈/ thenatmostonedistinctvaluecanbeassignedtoany GEXT I (p), given individual via this property. GEXT (3) ∀p ∈ I (p),(cid:104)x,y(cid:105) = I (p ) implies (cid:64)p(cid:48) ∈ Apropertypisaninstanceofowl:FunctionalProperty s G SEXT s s I (p),(cid:104)y,x(cid:105)=I (p ). G SEXT s iff ∀x,y ,y : 1 2 • Transitive property. A transitive property that re- (1) p ∈ IP,(cid:104)x,y (cid:105) ∈ I (p),(cid:104)x,y (cid:105) ∈ I (p) im- 1 EXT 2 EXT latesanindividualatoanindividualb,andindividual plies y =y , 1 2 b to an individual c, also relates a to c. (2) p ∈ IPg,(cid:104)x,y (cid:105) ∈ I (p),(cid:104)x,y (cid:105) ∈ I (p) 1 GEXT 2 GEXT A property p is an instance of the class implies y =y , 1 2 owl:TransitiveProperty iff ∀x,y,z: (3) ∀p ∈ I (p),(cid:104)x,y (cid:105) = I (p ),(cid:104)x,y (cid:105) = s G 1 SEXT s 2 (1) p ∈ IP,(cid:104)x,y(cid:105) ∈ I (p),(cid:104)y,z(cid:105) ∈ I (p) implies I (p ) implies y =y . EXT EXT SEXT s 1 2 (cid:104)x,z(cid:105)∈I (p), EXT • Inverse functional property. An inverse func- (2) p ∈ IPg,(cid:104)x,y(cid:105) ∈ I (p),(cid:104)y,z(cid:105) ∈ I (p) GEXT GEXT tional property can be regarded as a “key” property, implies (cid:104)x,z(cid:105)∈I (p), GEXT i.e., no two different individuals can be assigned the (3) ∀p ,p(cid:48) ∈ I (p),(cid:104)x,y(cid:105) = I (p ),(cid:104)y,z(cid:105) = same value via this property. I s(p(cid:48)s) impliGes ∃p(cid:48)(cid:48) ∈I (p),(cid:104)xS,Ez(cid:105)X=T Is (p(cid:48)(cid:48)). SEXT s s G SEXT s A property p is an instance of • Inverse property. The inverse of a given property owl:InverseFunctionalProperty iff ∀x ,x ,y: 1 2 isthecorrespondingpropertywithsubjectandobject (1) p ∈ IP,(cid:104)x ,y(cid:105) ∈ I (p),(cid:104)x ,y(cid:105) ∈ I (p) im- 1 EXT 2 EXT swapped for each property assertion built from it. plies x =x , 1 2 (p ,p )∈I (owl:inverseOf I) iff 1 2 EXT (2) p ∈ IPg,(cid:104)x ,y(cid:105) ∈ I (p),(cid:104)x ,y(cid:105) ∈ I (p) 1 GEXT 2 GEXT (1) ∀p ,p ∈ IP,I (p ) = {(cid:104)x,y(cid:105)|(cid:104)y,x(cid:105) ∈ implies x =x , 1 2 EXT 1 1 2 I (p )}, EXT 2 (3) ∀p ∈ I (p),(cid:104)x ,y(cid:105) = I (p ),(cid:104)x ,y(cid:105) = s G 1 SEXT s 2 (2) ∀p ,p ∈ IPg,I (p ) = {(cid:104)x,y(cid:105)|(cid:104)y,x(cid:105) ∈ I (p ) implies x =x . 1 2 GEXT 1 SEXT s 1 2 I (p )}, GEXT 2 • Reflexive property. A reflexive property relates (3) ∀p ∈ I (p ),∀p(cid:48) ∈ I (p ),I (p ) = (cid:104)x,y(cid:105), s G 1 s G 2 SEXT s every individual in the universe to itself. I (p(cid:48))=(cid:104)y,x(cid:105). SEXT s A property p is an instance of the class • Equivalent property. Two equivalent properties owl:ReflexiveProperty iff ∀x: share the same property extension. (1) p∈IP,(cid:104)x,x(cid:105)∈IEXT(p), (p ,p )∈I (owl:equivalentOf I) iff 1 2 EXT (2) p∈IPg,(cid:104)x,x(cid:105)∈IGEXT(p), (1) ∀p1,p2 ∈IP :IEXT(p1)=IEXT(p2), (3) ∀ps ∈IG(p),(cid:104)x,x(cid:105)=ISEXT(ps). (2) ∀p1,p2 ∈IPg:IGEXT(p1)=IGEXT(p2), • Irreflexive property. An irreflexive property does (3) ∀ps ∈ IG(p1),∃p(cid:48)s ∈ IG(p2) : ISEXT(ps) = not relate any individual to itself. ISEXT(p(cid:48)s). 4. CONTEXTUALINFERENCES 4.2 ContextualInferencing In the RDF, RDFS, and OWL 2 Full interpretations, we Back to the motivating example, from the contextual have proved several deduction rules in Section 3. Here we statement “BarackObama is married to Michelle Obama in present a set of these rules in Section 4.1 and demonstrate Chicago”,weashumanscaninferalistofstatements(S to 1 how these rules can be applied to derive the inferred state- S )asshowninTable1. Herewedemonstratestep-by-step 5 ments described in the motivating example in Section 4.2. howtoinferstatementsS toS fromtheoriginalstatement. 1 5 Ourinitialknowledgebaseincludesthesingletonproperty 4.1 ContextualEntailmentRules graphpatternrepresentingtheoriginalcontextualstatement Thethreefollowingrdf-sprulesarederivedfromtheRDF and the background knowledge as follows: interpretation. SP1: BarackObama isMarriedTo#1 MichelleObama. u rdf:singletonPropertyOf v . SP2: isMarriedTo#1 singletonPropertyOf isMarriedTo. (rdf-sp-1) SP : isMarriedTo#1 happenedIn Chicago. u rdf:type rdf:SingletonProperty . 3 T2: isMarriedTo subPropertyOf isSpouseOf. u rdf:singletonPropertyOf v . T5: Chicago partOf Illinois. (rdf-sp-2) T6: Illinois partOf USA. v rdf:type rdf:GenericProperty . Assume that we also have a partOf-rule that, if x hap- pened in a place y which is a part of a bigger place z, then u rdf:singletonPropertyOf v . x also happened at the place z. x u y . (rdf-sp-3) x v y . x happenedIn y . y partOf z . (partOf-rule) The four rdfs-sp rules are derived from the RDFS interpre- x happenedIn z . tation. We start by applying the partOf-rule on the triples SP 3 u rdf:singletonPropertyOf v . and T , we obtain the triple SP . 5 4 v rdfs:domain x . (rdfs-sp-1) SP4: isMarriedTo#1 happenedIn Illinois . u rdfs:domain x . ThesingletontriplepatternincludingtriplesSP andSP 1 2 u rdf:singletonPropertyOf v . derivesthestatement“BarackObamaismarriedtoMichelle v rdfs:range y . Obama”accordingtorulerdf-sp-3. Combiningthreetriples (rdfs-sp-2) u rdfs:range y . SP1, SP2, and SP4 will derive the contextual statement S1 “Barack Obama isMarriedTo Michelle Obama in Illinois”. u rdf:singletonPropertyOf x . Re-applyingthepartOf-ruleonthetriplesSP4andT6,we x rdfs:subPropertyOf y . obtain the new triple SP5. (rdfs-sp-3) u rdf:singletonPropertyOf y . SP : isMarriedTo#1 happenedIn USA . 5 Similar to S , the combination of triples SP , SP , and x rdf:type rdf:GenericProperty . 1 1 2 SP derives the contextual statement S “Barack Obama x rdfs:subPropertyOf y . 5 2 (rdfs-sp-4) isMarriedTo Michelle Obama in USA”. y rdf:type GenericProperty . Next,ifweapplytherulerdfs-sp-3onthetriplesSP and 2 T , we obtain the new triple SP . u rdf:singletonPropertyOf x . 2 6 x rdfs:subPropertyOf y . SP6: isMarriedTo#1 singletonPropertyOf isSpouseOf. (rdfs-sp-5) y rdf:type GenericProperty . Applying the rule rdf-sp-3 on the triples SP1 and SP6 derives the statement “Barack Obama isSpouseOf Michelle The rule rdfs-sp-5 can easily be drived by combining the Obama”. two rules rdfs-sp-3 and rdf-sp-2. Similar to the property The combination of the triples SP , SP , and SP will rdfs:subPropertyOf, here we also provide the rules for the 1 6 3 derive the contextual statement S “Barack Obama is- owl:equivalentOf. 3 SpouseOf Michelle Obama in Chicago”. u rdf:singletonPropertyOf x . Similarly, combining the triples SP , SP , and SP will 1 6 4 x owl:equivalentOf y . derive the contextual statement S “Barack Obama is- (owl-sp-1) 4 u rdf:singletonPropertyOf y . SpouseOf Michelle Obama in Illinois”. The combination of SP , SP , and SP derives the con- 1 6 5 x rdf:type GenericProperty . textual statement S “Barack Obama isSpouseOf Michelle 5 x owl:equivalentOf y . Obama in USA”. (owl-sp-2) y rdf:type GenericProperty . Therefore, we have shownhow thecontextual statements S to S can be inferred in our approach. 1 5 u rdf:singletonPropertyOf x . x owl:equivalentOf y . 5. IMPLEMENTATION (owl-sp-3) y rdf:type GenericProperty . Hereweexplainhowwecomputeallinferredtriplesusing If x owl:equivalentOf y then the proposed entailment rules for existing knowledge bases (1) x rdfs:subPropertyOf y and intwosteps. Firstwedescribehowwetransformtheexisting (2) y rdfs:subPropertyOf x. knowledge bases into the singleton property representation The above three owl-sp rules can be derived easily by in Section 5.1. Then Section 5.2 describes how all the pro- combingthisruleandtherulesrdfs-sp-3,rdfs-sp-4,andrdfs- posed rules are computed for every triple in the resulting sp-5, respectively. knowledge bases. 5.1 TransformingRepresentation 6. EVALUATION AswediscussedearlierinSection1,knowledgebaseslike Weevaluatetheperformanceofcomputinginferredtriples DBpediaandBio2RDFrepresentthecontextualinformation from the proposed rules on a large scale by using the tool suchasprovenanceintheformofaquad. Beforecomputing rdf-contextualizer on real-world knowledge bases. the inferred triples for these datasets, we need to prepare 6.1 ExperimentSetup the knowledge bases by transforming them to the singleton property representation. WeuseasingleserverinstalledwithUbuntu12.04. Ithas Given any quad in the form of (s, p, o, g), we trans- 24cores,eachcoreisIntelXeonCPU2.60GHz. Weusetwo formittothesingletonpropertyrepresentationbycreating disks, one SSD 220GB for storing input datasets, and one a singleton property (sp , singletonPropertyOf, p) and as- HDdisk2.7Tforwritingtheoutput. Thisserverhas256GB i serting the singleton triple (s, sp , o). We use the property of RAM, however, we limit 60GB for each Java program. i wasDerivedFrom from the PROV ontology [19] to represent 6.2 Datasets theprovenanceofthetriple(sp ,prov:wasDerivedFrom,g). i The singleton property URIs are constructed by appending We downloaded and used the ontologies and RDF quad a unique string to the generic property URI, with an incre- datasets from DBpedia [6] and four Bio2RDF datasets in- mentalcounterfortheuniquenumberinthewholedataset. cluding NCBI Genes [4], PharmGKB [5], CTD [1], and GO We developed a Java 8 tool, called rdf-contextualizer, to Annotations [2] in our evaluation. We chose these quad transform any RDF datasets from the quad representation datasets because they are large and widely-used with high to the singleton property. We took advantages of the Jena impact in the community. For Bio2RDF datasets, we also RIOT API [7] with high throughput parsers for parsing an downloaded the Bio2RDF mapping files [3]. inputfilefromanyRDFformatandgeneratingastreamof RDF quads. For each quad stream, we created a pipeline of streams for converting each quad to the singleton prop- Table 2: Number of RDF quads in the original ertyrepresentation,shorteningtriplestoTurtleformat,and datasets and number of unique RDF quads in the writing them to gzip files though buffer writers. As each duplicate-removed datasets stream is handled by a separate thread, we can utilize the Dataset # of Quads # of Unique Quads CPU resources, especially the ones with multiple cores, by NCB-NG 4,043,516,408 2,010,283,374 creating multiple threads for parsing multiple files concur- DBP-NG 1,039,275,891 784,508,538 rently. CTD-NG 644,147,853 327,648,659 Wevalidatedthesyntaxoftheoutputdatasetsbywriting PHA-NG 462,682,871 339,058,720 an analyzer to parse the output files and also generate the GOA-NG 159,255,577 97,522,988 statistics reported in Section 6.3. WereportedthenumberofRDFquadsperdatasetinthe 5.2 ComputingInferredTriples first and second columns of Table 2. The dataset identifier is taken from the first 3 letters of its name. Runningallcontextualentailmentrulesoneverysingleton We observed that there were too many duplicate quads triple produces at least two more triples (rdf-sp-1 and rdf- among the files within each dataset. We also believe that sp-3). Thenumberofinferredtriplesgoesuptomulti-billion the duplicates may be created on purpose. Each of these with datasets like DBpedia and Bio2RDF. That amount of datasetshasanumberoffilesandeachfilecontainsanum- inferred triples cannot fit an in-memory reasoner such as ber of RDF quads for a topic. For example, NCBI Genes Jena [10]. The proposed entailment rules can also be com- dataset has one file for all genes belonging to one species. putedinthereasonerswiththesupportforuser-definedrules Therefore,wekeepthesedatasetsintheoriginalversionand suchasOracle[25]. However,fortherulesgeneratingalarge createdanewversionforeachdatasetwithallduplicatesbe- number of inferred triples, optimization is necessary as Or- ing removed. We removed the duplicates by 1) concatenat- acle has optimized the computation of large number of in- ingallfilesintoasinglefile,2)splittingthisfileintomultiple ferredtriplesforowl:sameAs[18]. Withouttakingsuchopti- smaller files, 3) sorting each small file, and 4) merging all mizationstepintheexistingengines,itistime-consumingto the sorted files into a single file. The third column of Ta- querytherulepatternsandinserttheinferredtriplestothe ble 2 shows the number of unique quads per dataset after store because the insert query is always expensive. There- removing the duplicates. fore, the proposed contextual entailment rules may not be Wethengeneratedthesingletonpropertyversionofeach best computed in the existing engines without taking the dataset by running the tool rdf-contextualizer with and optimization step. without the -infer option. We ran each dataset version at To demonstrate that the computation of the proposed least3timesandreportedtheaverageresultsinSection6.3. rules can be scalable with proper optimization, we imple- The tool rdf-contextualizer and the materials used in mented our engine in the tool rdf-contextualizer. Since the this paper are publicly available for reproducing the exper- proposed rules can be applied to each triple independently, iments1. we can pass the triples to concurrent tasks. While trans- forming the RDF quads to the singleton property represen- 6.3 Results tation, we added an optional inferring task in the stream We consider four dimensions in our evaluation: number pipelinestocomputetheproposedrulesforeverytriple. The of triples, number of singleton triples, run time, and disk timedifferencebetweentherunswithandwithoutthe-infer option would be the run time for inferring triples added to 1https://archive.org/services/purl/ the overall process. rdf-contextualizer Figure 1: Total number of triples for each dataset in all four cases: with vs. without reasoning and with vs. without removing duplicates. Figure 2: Total number of singleton triples for each dataset in all four cases: with vs. without reasoning and with vs. without removing duplicates. space. In all figures, the series Reasoning stands for run- NCBI genes, the largest dataset, contains about 8 billion ning the tool with -infer option and No-Reasoning stands inferred triples (66.67%) more than the NoReasoning-Dup for running the tool without -infer option. Running the version. Similarly, the Reasoning-Unique version of NCBI NoReasoningoptionprovidestheresultsasthebaselinecost. genes contains about 4 billion inferred triples more than ThedifferencesintheresultsbetweentheNoReasoningand theNoReasoning-Uniqueversion. Someotherdatasetshave Reasoningversionsaretheextracostestimatedforthecom- their number of inferred triples higher than 66.67%. For putationoftheinferredtriples. Werunthetoolontwover- example, the Reasoning-Unique version of CTD contains sions of datasets. The series with Dup denotes the datasets about 950 million inferred triples (96.67%) more than the withduplicatequadsandtheserieswithUniquedenotesthe NoReasoning-Unique version. datasets with all duplicates being removed. BetweentheDupandUniqueversions,theReasoning-Dup Combining the two options: with vs. without -infer and versions contain 50% (NCBI Genes), 32% ( DBpedia), 86% with vs. without removing duplicates, each evaluating di- (CTD), 35% (PharmGKB) and 63% (GOA) more triples mensionhasfourcases: Reasoning-Dup,Reasoning-Unique, than the corresponding Reasoning-Unique versions. NoReasoning-Dup, and NoReasoning-Unique. 6.3.2 NumberofInferredSingletonTriples 6.3.1 NumberofInferredTriples Resulted from the inference rules involving property hi- Figure1showsthetotalnumberoftriplesofeachdataset erarchy in the schema, the number of inferred singleton in four cases. triples varies across datasets as shown in Figure 2 since In general, the Reasoning cases contain larger num- theyhavedifferentschema. Forexample,sinceNCBIGenes ber of triples than the NoReasoning cases, from 66.7% datasetdoesnothavepropertyhierarchy,thenumberofsin- to 96.67%. For example, the Reasoning-Dup version of gleton triples remains the same in both version Reasoning Figure 3: Disk space (zipped) for each dataset in all four cases: with vs. without reasoning and with vs. without removing duplicates. Figure 4: Run time (in minutes) for each dataset in all four cases: with vs. without reasoning and with vs. without removing duplicates. and NoReasoning. Meanwhile, CTD dataset inferred 194 form, we chose the Turtle format to serialize the triples to million singleton triples (30%) for the Reasoning-Dup ver- files. It allows us to arrange the triple order so that we sion and 148 million singleton properties (45%) for the canshortenthestringsoftriplessharingsubject,orsubject Reasoning-Unique version. and predicate. For shortening the URIs in the Turtle for- mat, we compiled a list of prefixes used in these datasets. 6.3.3 DiskSpace Thanks to this compact representation, comparing to the NoReasoning-Dup version, the Reasoning-Dup version only Generally speaking, for every RDF quad, we generated 3 adds 15-25% more space for 66.7-96.97% more number of triplesforNoReasoningversionandatleast5triplesforthe triples (Figure 3). Reasoningversion. Thatmakestheoverallnumberoftriples ofeachSPdatasetincreaseupto3-5timescomparedtothe 6.3.4 RunTime numberofquadsofthecorrespondingnamedgraphdataset. Consequently, the disk size of one SP dataset could be 3-5 Figure 4 shows the run time execution for all four cases times more than the disk size of its corresponding named of each dataset. GO Annotations dataset, took 11 minutes graph dataset. This is the case in which the triples of SP for 480 million triples of NoReasoning-Dup version and 13 datasets being serialized into the N-Triple format. For very minutesfor796milliontriplesofReasoning-Dupversion. In large datasets like DBpedia and NCBI Genes, we do not otherwords,itadded2minutestotheoverallprocesstoinfer recommend this N-Triple serialization since the unzipped 316 million triples. NCBI Genes, the largest dataset, took files may require disks with tetrabyte capacity. 232minutesfor12billiontriplesofNoReasoning-Dupversion RDF Turtle format is more compact than the N-Triple and274minutesfor20billiontriplesofReasoning-Dupver- format, especially with very large datasets. Since our ap- sion. Intermsofpercentage,theReasoning-Dupversionsof proach enables data to be represented in the RDF triple thesedatasetsadd14-19%runtimeforinferring66.7-96.67%