WWrriigghhtt SSttaattee UUnniivveerrssiittyy CCOORREE SScchhoollaarr Browse all Theses and Dissertations Theses and Dissertations 2012 LLiinnkkeedd OOppeenn DDaattaa AAlliiggnnmmeenntt && QQuueerryyiinngg Prateek Jain Wright State University Follow this and additional works at: https://corescholar.libraries.wright.edu/etd_all Part of the Computer Engineering Commons, and the Computer Sciences Commons RReeppoossiittoorryy CCiittaattiioonn Jain, Prateek, "Linked Open Data Alignment & Querying" (2012). Browse all Theses and Dissertations. 606. https://corescholar.libraries.wright.edu/etd_all/606 This Dissertation is brought to you for free and open access by the Theses and Dissertations at CORE Scholar. It has been accepted for inclusion in Browse all Theses and Dissertations by an authorized administrator of CORE Scholar. For more information, please contact [email protected]. Linked Open Data Alignment & Querying A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy By PRATEEK JAIN B. Tech., DA-IICT, India, 2006 2012 Wright State University COPYRIGHTBY PrateekJain 2012 WRIGHTSTATEUNIVERSITY SCHOOLOFGRADUATESTUDIES August21,2012 I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY SU- PERVISION BY Prateek Jain ENTITLED Linked Open Data Alignment & Querying BE AC- CEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DoctorofPhilosophy. AmitP.Sheth,Ph.D. DissertationDirector ArthurA.Goshtasby,Ph.D. Director,ComputerSciencePh.D.Program AndrewT.Hsu,Ph.D. Dean,SchoolofGraduateStudies Committeeon FinalExamination AmitP.Sheth, Ph.D. PascalHitzler, Ph.D. KrishnaprasadThirunarayan, Ph.D. PeterZ.Yeh, Ph.D. KunalVerma, Ph.D. s ABSTRACT Jain, Prateek . Ph.D.,Department of Computer Science & Engineering, Wright State University, 2012. LinkedOpenDataAlignment&Querying. TherecentemergenceoftheLinkedDataapproachforpublishingdatarepresentsamajorstepforward inrealizingtheoriginalvisionofawebthatcan”understandandsatisfytherequestsofpeopleandmachines to use the web content” i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 295 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasetsintheLODCloud aswewillillustrate aretooshallowtorealizemuchofthebenefitspromised. If thislimitationisleftunaddressed,thentheLODCloudwillmerelybemoredatathatsuffersfromthesame kindsofproblems,whichplaguetheWebofDocuments,andhencethevisionoftheSemanticWebwillfall short. This thesis presents a comprehensive solution to address the issue of alignment and relationship iden- tification using a bootstrapping based approach. By alignment we mean the process of determining corre- spondencesbetweenclassesandpropertiesofontologies. Weidentifysubsumption,equivalenceandpart-of relationship between classes. The work identifies part-of relationship between instances. Between proper- ties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. Theworkshowcasesuseofbootstrappingbasedmethodstoidentifyandcreatericherrelationshipsbe- tweenLODdatasets. TheBLOOMSproject(http://wiki.knoesis.org/index.php/BLOOMS)andthePLATO project, bothbuiltaspartofthisresearch, haveprovidedevidencetothefeasibilityandtheapplicabilityof thesolution. Contents 1 Introduction 1 1.1 GoalsofthisDissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 ConceptualContributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Artifacts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 ChapterOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 SemanticWebandStateoftheArt 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 DomainSpecificOntology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 UpperLevelOntology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 BasicRelationshipspresentinOntologies . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 SPARQLQueryTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 LinkedData 16 3.1 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 LinkedOpenData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4 Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.1 AbsenceofSchemaLevelLinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.2 LackofConceptualDescriptionofDatasets . . . . . . . . . . . . . . . . . . . . . . 22 3.4.3 Lackofexpressivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.4 Difficultieswithrespecttoquerying . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 OntologyAlignmentforConceptsonLinkedOpenData 28 4.1 OntologyMatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.2 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.2.1 NameMatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.2.2 DescriptionMatching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.2.3 Constraint-basedMatching . . . . . . . . . . . . . . . . . . . . . . . . . 30 v 4.1.2.4 InstancebasedMatching . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.3 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 BLOOMSApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.1 Evaluation: OntologyAlignmentEvaluationInitiativeOrientedTrack. . . . . . . . . 39 4.3.2 Evaluation: OntologyAlignmentEvaluationInitiativeBenchmarkTrack. . . . . . . 41 4.4 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 ContextualOntologyAlignmentofLOD 44 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 KnowledgeRequirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.1 ConstructBLOOMS+Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2 ComputeClassSimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.3 ComputeContextualSimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3.4 ComputeOverallSimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.4.1 DataSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.4.2 ExperimentalSetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.4.3 ResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 PartonomicalRelationshipIdentificationonLinkedOpenData 59 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Winston’sApproachtoPart-of Relationships—Ontologized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3.1 CandidateGeneration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3.2 HypothesisGeneration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.3.3 HypothesisTesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.4.1 Intra-DatasetInstance-LevelPartonomyDiscovery . . . . . . . . . . . . . . . . . . 71 6.4.2 Inter-DatasetInstance-LevelPartonomyDiscovery . . . . . . . . . . . . . . . . . . 74 6.4.3 Assertionofschemalevellinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7 QueryingPartonomicalRelationshiponLODcloud 79 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.3 Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4 PARQApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4.1 SystemArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4.1.1 MappingRepository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4.1.2 TransformationRuleGenerator . . . . . . . . . . . . . . . . . . . . . . . 83 7.4.1.3 QueryRe-writer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.4.2 Meta-levelTransformationRules . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.4.3.1 Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.5.1 GeonamesResultsandDiscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.5.2 AdministrativeGeographyOntologyResultsandDiscussion . . . . . . . . . . . . . 92 7.5.3 SummaryofResultsandLimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.6 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8 LOQUS:LinkedOpenDataSPARQLQueryingSystem 100 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.3 OurApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.6 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 9 Conclusion 112 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 9.2 FurtherWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 9.2.1 RicherRelationshipIdentificationonLOD . . . . . . . . . . . . . . . . . . . . . . 114 9.2.2 YellowPagesforLOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 9.2.3 FlexibleQuestionAnsweringusingLOD . . . . . . . . . . . . . . . . . . . . . . . 115 9.2.4 PropertyMatchingonLOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 9.2.5 LODIntegrationandEnhancement . . . . . . . . . . . . . . . . . . . . . . . . . . 117 9.3 FinalRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Bibliography 120 List of Figures 2.1 Exampleofanontology. Source: http://knoesis.org/research/semweb/projects/stt/ . . . . . . 10 2.2 ExampleofanRDFGraph. Source: http://www.w3.org/TR/rdf-primer/ . . . . . . . . . . . 13 3.1 RDFInterlinkingbetweendifferentdatasetsusing . . . . . . . . . . . . . . . . . . . . . . 18 3.2 DatasetsavailableaspartofLODinMay2007 . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 DatasetsavailableaspartofLODin2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 PossibleLODintegrationwithSUMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1 BLOOMS trees for Jazz Festival with sense Jazz Festival and for Event with sense Event. Tosavespace,somecategoriesarenotexpandedtolevel4. . . . . . . . . . . . . . . . . . 34 5.1 BLOOMS+treesforRecordLabelandMusicCompany . . . . . . . . . . . . . . . . . . . 49 6.1 PLATOSystemArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.1 PARQsystemflowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.2 PARQResultsonGeonames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3 ComparsionPSPARQLandPARQonGeonamesforrespondent4 . . . . . . . . . . . . . . 99 7.4 ComparisonforOrdnanceSurveyDatasetforRespondent4 . . . . . . . . . . . . . . . . . . 99 8.1 LOQUSArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 viii List of Tables 3.1 SomeDatasetsthatarePartofLODCloud . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1 Resultsontheorientedmatchingtrack. ResultsforRiMOMandAROMAhavebeentaken from the OAEI 2009 website. Legends: Prec=Precision, Rec=Recall, A-API=Alignment API,OMV=OMViaUO,NaN=divisionbyzero,likelyduetoemptyalignment. . . . . . . . 39 4.2 Comparisonofvarioussystemsonthebenchmarktrack. ResultsforRiMOMandAROMA havebeenreusedfromtheOAEI2009website. Legends: Prec=Precision,Rec=Recall. . . . 41 5.1 Common nodes between the two trees in Figure 5.3.2, and their depth. The first column givesthecommonnodesbetweenthetwotreesrootedatRecordLabelandMusicIndustry. Thesecondcolumngivesthedepth(thedistancefromroot)ofthesenodesintheBLOOMS+ treerootedatRecordLabel–i.e. thesourcetree. . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 SamplemappingsofLODontologiestoPROTON. . . . . . . . . . . . . . . . . . . . . . . 53 5.3 Results for various solutions on the task of aligning LOD schemas to PROTON. Legend: S-Match-M=ResultofS-MatchMinimalSet,S-Match-C=ResultofS-MatchCompleteSet, Prec=Precision,Rec=Recall,F=F-MeasurePRO=PROTONOntology,FB=FreebaseOntol- ogy,DB=DBpediaOntology,GEO=GeonamesOntology . . . . . . . . . . . . . . . . . . . 55 5.4 SampleofcorrectmappingsfromLODontologiestoPROTONgeneratedbyBLOOMS+. . 55 5.5 SampleofincorrectmappingsfromLODontologiestoPROTONgeneratedbyBLOOMS+. 56 6.1 Sixtypeofpartonomicrelationwithrelationalelements . . . . . . . . . . . . . . . . . . . . 62 6.2 PrecisionofthesixdifferentrelationtypesbetweenDBpediaentities . . . . . . . . . . . . . 73 6.3 ThistableshowsPLATO’sperformanceonprecisionandrecallfortheDish-Ingredienttask, and PLATO’s performance on precision for the Anatomy-Organ task. Recall was not re- portedforthesecondtaskbecauseoftimeandresourcelimitations. . . . . . . . . . . . . . 75 6.4 PrecisionasmeasuredonSchemaLevelLinksBetweenDBpediaentities . . . . . . . . . . 76 7.1 ImportantPropertiesinGeonames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.2 ImportantPropertiesinAdministrativeGeographyOntology . . . . . . . . . . . . . . . . . 89 8.1 Resultexecutionofqueriesovergeonames . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.2 Resultexecutionofqueriesoverdbpedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.3 Resultexecutionofqueriesoverlinkedmdb . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.4 Resultofusersubmittedquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8.5 ResultexecutionofqueriesusingLOQUS . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.6 ComaparisonLODSPARQLQueryProcessingSystems . . . . . . . . . . . . . . . . . . . 110 ix
Description: