Table Of Content

Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora Aidan Hogan Supervisor: Dr. Axel Polleres Internal Examiner: Prof. Stefan Decker External Examiner: Prof. James A. Hendler Dissertation submitted in pursuance of the degree of Doctor of Philosophy Digital Enterprise Research Institute, Galway National University of Ireland, Galway / Ollscoil na hEíreann, Gaillimh April 11, 2011 Copyright (cid:13)c Aidan Hogan, 2011 The research presented herein was supported by an IRCSET Postgraduate Scholarship and by Science Foundation Ireland under Grant No. SFI/02/CE1/I131 (Lion) and Grant No. SFI/08/CE/I1380 (Lion-2). “If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.” —George Bernard Shaw Acknowledgements First, thanks to the taxpayers for the pizza and (much needed) cigarettes; ...thanks to friends and family; ...thanks to the various students and staff of DERI; ...thanks to the URQ folk; ...thanks to people with whom I have worked closely, including Alex, Antoine, Jeff, Luigi and Piero; ...thanks to people with whom I have worked very closely, particularly Andreas and Ju¨rgen; ...thanks to John and Stefan for the guidance; ...thanks to Jim for the patience and valuable time; ...and finally, a big thanks to Axel for everything. i Abstract The Web contains a vast amount of information on an abundance of topics, much of which is encoded as structured data indexed by local databases. However, these databases are rarely interconnected and information reuse across sites is limited. Semantic Web standards offer a possible solution in the form of an agreed-upon data model and set of syntaxes, as well as metalanguages for publishing schema-level information, offering a highly-interoperable means of publishing and interlinking structured data on the Web. Thanks to the Linked Data community, an unprecedented lode of such data has now been published ontheWeb—byindividuals,academia,communities,corporationsandgovernmentalorganisationsalike—on a medley of often overlapping topics. Thisnewpublishingparadigmhasopeneduparangeofnewandinterestingresearchtopicswithrespectto howthisemergent“WebofData”canbeharnessedandexploitedbyconsumers. Indeed,althoughSemantic Web standards theoretically enable a high level of interoperability, heterogeneity still poses a significant obstacle when consuming this information: in particular, publishers may describe analogous information using different terminology, or may assign different identifiers to the same referents. Consumers must also overcometheclassicalchallengesofprocessingWebdatasourcedfrommultitudinousandunvettedproviders: primarily, scalability and noise. Inthisthesis,welookattacklingtheproblemofheterogeneitywithrespecttoconsuminglarge-scalecor- poraofLinkedDataaggregatedfrommillionsofsourcesontheWeb. Assuch,wedesignbespokealgorithms— inparticular,basedontheSemanticWebstandardsandtraditionalInformationRetrievaltechniques—which leverage the declarative schemata (a.k.a. terminology) and various statistical measures to help smooth out the heterogeneity of such Linked Data corpora in a scalable and robust manner. All of our methods are distributed over a cluster of commodity hardware, which typically allows for enhancing performance and/or scale by adding more machines. WefirstpresentadistributedcrawlerforcollectingagenericLinkedDatacorpusfrommillionsofsources; we perform an open crawl to acquire an evaluation corpus for our thesis, consisting of 1.118 billion facts of information collected from 3.985 million individual documents hosted by 783 different domains. Thereafter, we present our distributed algorithm for performing a links-based analysis of the data-sources (documents) comprising the corpus, where the resultant ranks are used in subsequent chapters as an indication of the importance and trustworthiness of the information they contain. Next, we look at custom techniques for performing rule-based materialisation, leveraging RDFS and OWL semantics to infer new information, often using mappings—provided by the publishers themselves—to translate between different terminologies. Thereafter,wepresentaformalframeworkforincorporatingmetainformation—relatingtotrust,provenance and data-quality—intothis inferencingprocedure; in particular, we deriveand track rankingvaluesfor facts based on the sources they originate from, later using them to repair identified noise (logical inconsistencies) in the data. Finally, we look at two methods for consolidating coreferent identifiers in the corpus, and we present an approach for discovering and repairing incorrect coreference through analysis of inconsistencies. Throughout the thesis, we empirically demonstrate our methods against our real-world Linked Data corpus, and on a cluster of nine machines. iii Declaration I declare that this thesis is composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. Aidan Hogan April 11, 2011 v

Description:

National University of Ireland, Galway / Ollscoil na hÉireann, Gaillimh Ireland under Grant No. Aidan Hogan waives the possibility of that URI dereferencing to her local contribution. Web; along these lines, consider researching the question: Which five universities have the highest number.

Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora PDF

262 Pages·2011·2.65 MB·English

by Aidan Hogan

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora PDF Free - Full Version

by Aidan Hogan| 2011| 262 pages| 2.65| English

Download Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora by Aidan Hogan in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora

Detailed Information

Author:	Aidan Hogan
Publication Year:	2011
Pages:	262
Language:	English
File Size:	2.65
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora PDF?

Yes, on https://PDFdrive.to you can download Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora by Aidan Hogan completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora on my mobile device?

After downloading Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora?

Yes, this is the complete PDF version of Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora by Aidan Hogan. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.