ebook img

The Web of Data PDF

689 Pages·2020·15.16 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Web of Data

Aidan Hogan The Web of Data The Web of Data Aidan Hogan The Web of Data Aidan Hogan Department of Compu ter Science Universidad de Chile Santiago de Chile, Chile ISBN 978-3-030-51579-9 ISBN 978-3-030-51580-5 (eBook) https://doi.org/10.1007/978-3-030-51580-5 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland For all those without whom this book would not have been possible. To all those without whom it would have been finished sooner. Preface The idea of a “Web of Data”, as discussed in this book, has been around since at least 1998, when Berners-Lee referenced his plan “for achieving a set of connected applications for data on the Web in such a way as to form a consistent logical web of data”. Myriad developments have taken place since thentowardsrealisingaWebofData,includingstandards,languages,proto- cols, tools, optimisations, theorems, and more besides. These developments have more recently been deployed on millions of websites, where most users interact with the Web of Data on a daily basis, perhaps without realising it. The core objective of the Web of Data is to publish content on the Web in formats that machines can process more easily and accurately than the human-friendlyHTMLdocumentsformingthecurrent“WebofDocuments”. As the Web becomes increasingly machine readable, increasingly complex tasks can be automated on the Web, yielding more and more powerful Web applications that are capable of discovering, cross-referencing, filtering, and organising data from numerous websites in a matter of seconds. Assume, for example, that we are running a recipe website. Rather than only describing the recipe in the text of a paragraph – like “This easy dessert recipe for Pecan Pie takes around 20–25 minutes ...” – the first step of the WebofDataistopublishstructuredDataabouttherecipethatiseasierfor machines to process than natural language. The data can be populated from theunderlyingdatabaseweuseforthewebsite,andcanbeembeddedwithin the webpages we already have, or can be published as separate documents. Ideally we can use a data format that is also used by other recipe sites, such that external applications – e.g., search engines – can compile together these data from different sources, allowing users to quickly Query for recipes by type, difficulty, duration, etc. Such applications will then point users back to our website for more details about the recipes of interest to them. Nextwewouldliketoaddallergyinformationtoallowuserstofilterrecipes thatarenotsafeforthemtoeat;forexample,ifauserisallergictotreenuts, theywouldliketoquicklyruleoutrecipesusingalmonds,cashews,chestnuts, vii viii Preface pecans, pistachios, or ingredients made using such nuts, including marzipan, certain liqueurs, etc. However, allergies are potentially complex to model in our data, and we need to avoid offering false or misleading information. Rather than manage this sort of information ourselves, we can Link the ingredientsofourrecipestoanofficial,centralrepositoryofnutritionaldata, whichdefinestheSemanticsofdifferenttypesoffoods,suchastostatethat cashews,chestnuts,etc.,aretypesoftreenuts;thatalltreenutsareculinary nuts; that legumes (like peanuts) are culinary nuts, but not tree nuts; etc. What started out as some simple data about recipes slowly expands over time and becomes more and more diverse, allowing our users to find which of our recipes are of more relevance to them through increasingly detailed queries.Butasnewrecipesareadded,westarttofindrecipeswithincomplete data (e.g., durations are missing in some cases), or inconsistent information (e.g., recipes requiring a pasta machine are marked “easy”). Hence we start to implement mechanisms for Validation, checking that every recipe has a title, a description, a duration, etc., that recipes marked as easy require equipment from a common list of utensils, etc. These validation mechanisms help us to identify, prioritise and address potential quality issues that might affect our users as the recipe data expand and grow more diverse. Finally,wewillconsiderauserwhohasa(hypothetical,asofyet)personal softwareagentwithlocalaccesstosensitivedata,suchastheuser’slocation, medical issues, culinary preferences, available utensils, etc. Based on a query fromtheuser,suchanagentmaythenventureoutonthisexpandingWebof Data, discover recipe sites like ours, discover product lists of local supermar- kets,discovernutritionalrecommendationsassociatedwiththeusers’medical issues(ifany),andpullallofthesedatatogetherinordertorecommendalist of safe recipes they can make at home with ingredients available for delivery from a nearby supermarket. The user can then browse this list of recipes, ordering or filtering by price, duration, calories, etc., perhaps even visiting our website for the details of a recipe that catches their eye. The scenario we describe here, of course, is not nearly as straightforward to put into practice as we seem to suggest. Considering each part in detail raises various technical questions and challenges regarding how we structure data, how queries should be formulated, how links to remote data can be specified, the types of semantics we need, how constraints are formulated for validation, and so forth. However, there have also been over two decades of work on proposals to address these questions, particularly within the Se- mantic Web community, which has brought forward the most concrete and completeinstantiationofaWebofDataseentodate,withdetailedstandards andrecommendationsrelatingtoData,Queries,Links,Semantics,Val- idation and more besides. These proposals have been adopted on the Web, with the standards proposed for data being promoted by popular search en- ginesandusedonmillionsofwebsites.Standardsforqueries,links,semantics and validation are also being increasingly adopted for specific use-cases. Preface ix The primary undertaking of this book is then to draw together these pro- posals, to motive and describe them in detail, to provide examples of their use, and to discuss how they contribute to – and how they have been used thusfaron–theWebofData.Thebookisaimedatstudents,researchersand practitioners interested in learning more about the Web of Data, and about closely related topics such as the Semantic Web, Knowledge Graphs, Linked Data, Graph Databases, Ontologies, etc. The book can serve as a textbook for students and other newcomers, where it motivates the topics discussed, offers accessible examples and exercises, etc.; it can also serve as a reference handbook for researchers and developers, where it offers up-to-date details ofkeystandards(RDF,RDFS,OWL,SPARQL,SHACL,ShEx,RDB2RDF, LDP), along with formal definitions and references to further literature. The book is structured around nine chapters, as follows: Introduction: Chapter1introducesthebook,discussingtheshortcomings of the current Web that illustrate the need for a Web of Data. Web of Data: Chapter2providesanoverviewofthefundamentalconcepts underlyingtheWebofData,anddiscussessomecurrentuse-casesonthe Web where such concepts are already deployed. Resource Description Framework (RDF): In Chapter 3, we describe RDF: the graph-structured data model proposed by the Semantic Web community as a common data model for the Web. RDF Schema (RDFS) and Semantics: InChapter4,wedescribeRDFS: a lightweight ontology language used to define an initial semantics for terms used in RDF graphs. Web Ontology Language (OWL): In Chapter 5, we describe OWL: a more expressive ontology language built upon RDFS that offers much more powerful ontological features. SPARQL Query Language: In Chapter 6, we describe a language for querying and updating RDF graphs, with examples of the features it supports, and a detailed definition of its semantics. Shape Constraints and Expressions (SHACL/ShEx): Chapter 7 in- troduces two languages for describing the expected structure of – and expressing constraints on – RDF graphs for the purposes of validation. Linked Data: In Chapter 8, we discuss the principles and best practices proposed by the Linked Data community for publishing interlinked (RDF) data on the Web, and how those techniques have been adopted. Conclusions InChapter9,wewrap-upwithopenproblemsandmoregen- eral discussion on the future of the Web of Data. A website associated with the book – http://webofdatabook.org/ – con- tains complementary material, including solutions for exercises, slides for classes, raw data for examples, a comments section, and more besides. Santiago, Chile; April 2020 Aidan Hogan Acknowledgements First, I would like to thank colleagues and students past and present, who have introduced me to this topic, worked with me on this topic, or whom I have introduced to this topic. I have learnt a lot from all of you. IwouldliketothankThanhTranandGongChengwhowereinstrumental in making this book happen. I would like to thank Ralf Gerstner of Springer for his guidance, support, and immense patience during the lengthy preparation of this book. I would like to thank the observant and patient reviewer whose detailed feedback helped to improve this manuscript. Iwouldliketothankfamilyandfriendsforremindingmeonoccasionthat there was perhaps more to life than this book and its topic. Finally, thanks to Analí for supporting me throughout this journey, and doing all of those important things, both big and small, that needed to be done so that I could keep my head buried in this endeavour. xi Contents 1 Introduction.............................................. 1 1.1 The Latent Web........................................ 3 1.2 The Current Web ...................................... 6 1.2.1 Hypertext Markup Language (HTML) .............. 7 1.2.2 Interpreting HTML Content ....................... 9 2 Web of Data.............................................. 15 2.1 Overview.............................................. 16 2.2 Web of Data: Concepts.................................. 16 2.2.1 Data............................................ 16 2.2.2 Queries ......................................... 21 2.2.3 Semantics ....................................... 25 2.2.4 Constraints...................................... 35 2.2.5 Links ........................................... 37 2.3 The Current Web of Data ............................... 40 2.3.1 Wikidata........................................ 41 2.3.2 Knowledge Graphs ............................... 45 2.3.3 Schema.org and the Open Graph Protocol ........... 48 2.3.4 Linking Open Data ............................... 53 2.4 Summary.............................................. 56 2.5 Discussion ............................................. 57 3 Resource Description Framework ......................... 59 3.1 Overview.............................................. 60 3.2 Terms................................................. 61 3.2.1 Internationalised Resource Identifiers (IRIs) ......... 61 3.2.2 Literals ......................................... 63 3.2.3 Blank Nodes..................................... 69 3.2.4 Defining RDF Terms.............................. 70 3.3 Triples ................................................ 71 3.4 Graphs................................................ 74 xiii

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.