ebook img

Validating RDF Data PDF

308 Pages·2017·2.505 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Validating RDF Data

Validating RDF Data Jose Emilio Labra Gayo UniversityofOviedo Eric Prud’hommeaux W3C/MITandMicelio Iovka Boneva UniversityofLille Dimitris Kontokostas UniversityofLeipzig SYNTHESISLECTURESONSEMANTICWEB: THEORYAND TECHNOLOGY#16 M &C Morgan&cLaypool publishers Copyright©2018byMorgan&Claypool ValidatingRDFData JoseEmilioLabraGayo,EricPrud’hommeaux,IovkaBoneva,andDimitrisKontokostas www.morganclaypool.com ISBN:9781681731643 paperback ISBN:9781681731650 ebook ISBN:9781681731667 e-pub ABSTRACT RDFandLinkedDatahavebroadapplicabilityacrossmanyfields,fromaircraftmanufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describesitspracticaluseinday-to-daydataexchange. TheSemanticWeboffersabold,newtakeonhowtoorganize,distribute,index,andshare data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributeddatabasesonaglobalscale.LiketheWeb,theSemanticWebisheraldedasaninfor- mationrevolution,andalsoliketheWeb,itisencumberedbydataqualityissues.Thequalityof SemanticWebdataiscompromisedbythelackofresourcesfordatacuration,formaintenance, andfordevelopinggloballyapplicabledatamodels. Attheenterprisescale,theseproblemshaveconventionalsolutions.Masterdatamanage- ment provides an enterprise-wide vocabulary, while constraint languages capture and enforce data structures. Filling a need long recognized by Semantic Web users, shapes languages pro- videmodelsandvocabulariesforexpressingsuchstructuralconstraints. ThisbookdescribestwotechnologiesforRDFvalidation:ShapeExpressions(ShEx)and Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the two,andsomeexampleapplications. KEYWORDS RDF,ShEx,SHACL,shapeexpressions,shapesconstraintlanguage,dataquality, webofdata,SemanticWeb,linkeddata Contents Preface ........................................................... xv ForewordbyPhilArcher ...........................................xvii ForewordbyTomBaker............................................ xix ForewordbyDanBrickleyandLibbyMiller ........................... xxi 1 Introduction .......................................................1 1.1 RDFandtheWebofData ......................................... 1 1.2 RDF:TheGoodParts ............................................. 1 1.3 ChallengesforRDFAdoption ...................................... 3 1.4 StructureoftheBook.............................................. 5 1.5 ConventionsandNotation.......................................... 6 2 TheRDFEcosystem.................................................9 2.1 RDFHistory .................................................... 9 2.2 RDFDataModel................................................ 10 2.3 SharedEntitesandVocabularies .................................... 17 2.4 TechnologiesRelatedwithRDF .................................... 18 2.4.1 SPARQL................................................. 18 2.4.2 InferenceSystems:RDFSchemaandOWL..................... 20 2.4.3 LinkedData,JSON-LD,Microdata,andRDFa.................. 23 2.5 Summary ...................................................... 25 2.6 SuggestedReading ............................................... 25 3 DataQuality ......................................................27 3.1 Non-RDFSchemaLanguages ..................................... 28 3.1.1 UML .................................................... 28 3.1.2 SQLandRelationalDatabases................................ 29 3.1.3 XML .................................................... 31 3.1.4 JSON.................................................... 37 3.1.5 CSV..................................................... 39 3.2 UnderstandingtheRDFValidationProblem .......................... 40 3.3 PreviousRDFValidationApproaches................................ 45 3.3.1 Query-basedValidation ..................................... 45 3.3.2 Inference-basedApproaches.................................. 47 3.3.3 StructuralLanguages ....................................... 48 3.4 ValidationRequirements .......................................... 48 3.4.1 GeneralRequirements ...................................... 49 3.4.2 Graph-basedRequirements .................................. 49 3.4.3 RDFDataModelRequirements .............................. 50 3.4.4 Data-modeling-basedRequirements ........................... 50 3.4.5 ExpressivenessofSchemaLanguage ........................... 51 3.4.6 ValidationInvocationRequirements ........................... 52 3.4.7 UsabilityRequirements...................................... 52 3.5 Summary ...................................................... 52 3.6 SuggestedReading ............................................... 53 4 ShapeExpressions .................................................55 4.1 UseofShEx .................................................... 55 4.2 FirstExample ................................................... 56 4.3 ShEximplementations............................................ 58 4.4 TheShapeExpressionsLanguage ................................... 59 4.4.1 ShapeExpressionsCompactSyntax............................ 59 4.4.2 InvokingValidation......................................... 60 4.4.3 StructureofShapeExpressions ............................... 63 4.4.4 StartShapeExpression ...................................... 64 4.5 NodeConstraints ................................................ 65 4.5.1 Nodekinds ............................................... 67 4.5.2 Datatypes................................................. 68 4.5.3 FacetsonLiterals .......................................... 70 4.5.4 ValueSets ................................................ 73 4.6 Shapes......................................................... 78 4.6.1 TripleConstraints .......................................... 79 4.6.2 Groupings ................................................ 80 4.6.3 Cardinalities .............................................. 80 4.6.4 Choices .................................................. 82 4.6.5 NestedShapes ............................................. 84 4.6.6 InverseTripleConstraints.................................... 85 4.6.7 RepeatedProperties ........................................ 86 4.6.8 PermittingotherTriples ..................................... 87 4.7 References...................................................... 90 4.7.1 ShapeReferences........................................... 90 4.7.2 RecursionandCyclicReferences .............................. 91 4.7.3 ExternalShapes............................................ 92 4.7.4 LabeledTripleExpression ................................... 93 4.7.5 Annotations............................................... 94 4.8 LogicalOperators................................................ 95 4.8.1 Conjunction............................................... 95 4.8.2 Disjunction ............................................... 98 4.8.3 Negation ................................................ 101 4.9 ShapeMaps ................................................... 105 4.9.1 FixedShapeMaps......................................... 105 4.9.2 QueryShapeMaps ........................................ 106 4.9.3 ResultShapeMaps ........................................ 108 4.9.4 JSONRepresentation ...................................... 109 4.9.5 ChainingValidationWorkflows .............................. 110 4.10 SemanticActions ............................................... 110 4.11 ShExandInference ............................................. 111 4.12 Importingschemas.............................................. 113 4.13 RDFandJSON-LDSyntax ...................................... 114 4.14 Summary ..................................................... 116 4.15 SuggestedReading .............................................. 116 5 SHACL .........................................................119 5.1 SimpleExample ................................................ 119 5.2 SHACLImplementations ........................................ 122 5.3 BasicDefinitions:ShapesGraphs,Node,andPropertyShapes........... 124 5.4 ImportingotherShapesGraphs ................................... 125 5.5 ValidationReport ............................................... 126 5.6 Shapes........................................................ 129 5.6.1 Nodeshapes.............................................. 129 5.6.2 PropertyShapes........................................... 130 5.6.3 ConstraintComponents .................................... 131 5.6.4 HumanFriendlyMessages .................................. 133 5.6.5 DeclaringShapeSeverities .................................. 134 5.6.6 DeactivatingShapes ....................................... 135 5.7 TargetDeclarations ............................................. 137 5.7.1 TargetNode.............................................. 137 5.7.2 TargetClass.............................................. 138 5.7.3 ImplicitClassTarget....................................... 139 5.7.4 TargetSubjectsOf ........................................ 140 5.7.5 TargetObjectsOf ......................................... 141 5.8 Cardinality .................................................... 141 5.9 ConstraintsonValues............................................ 142 5.9.1 Datatypes................................................ 142 5.9.2 ClassofValues............................................ 145 5.9.3 NodeKinds .............................................. 146 5.9.4 SetsofValues............................................. 147 5.9.5 SpecificValue ............................................ 148 5.10 DatatypeFacets ................................................ 148 5.10.1ValueRanges ............................................. 149 5.10.2String-basedConstraints ................................... 149 5.10.3Language-basedConstraints ................................ 151 5.11 LogicalConstraints:and,or,not,xone .............................. 154 5.11.1AND ................................................... 154 5.11.2OR..................................................... 157 5.11.3ExactlyOne.............................................. 159 5.11.4Not..................................................... 162 5.11.5CombiningLogicalOperators ............................... 162 5.12 Shape-basedConstraints ......................................... 164 5.12.1ShapeReferencesandRecursion ............................. 166 5.12.2QualifiedValueShapes ..................................... 174 5.13 ClosedShapes ................................................. 177 5.14 PropertyPairConstraints......................................... 180 5.15 Non-validatingSHACLProperties................................. 182 5.16 SHACL-SPARQL ............................................. 184 5.16.1SPARQLConstraints...................................... 184 5.16.2SPARQL-basedConstraintComponents ...................... 185 5.17 SHACLandInferenceSystems.................................... 188 5.18 SHACLCompactSyntax ........................................ 190 5.19 SHACLRulesandAdvancedFeatures.............................. 190 5.20 SHACLJavascript .............................................. 193 5.21 Summary ..................................................... 194 5.22 SuggestedReading .............................................. 194 6 Applications .....................................................195 6.1 DescribingaLinkedDataPortal................................... 195 6.1.1 WebIndexinShEx ........................................ 197 6.1.2 WebIndexinSHACL...................................... 200 6.2 DescribingClinicalRecords—FHIR ............................... 204 6.2.1 FHIRasLinkedData...................................... 206 6.2.2 Consistencyconstraints..................................... 206 6.2.3 FHIR/RDFDevelopment .................................. 209 6.2.4 GenericProperties......................................... 210 6.3 SpringerNatureSciGraph ........................................ 212 6.4 DBpediaValidationUseCases .................................... 213 6.4.1 Ontology-basedValidation.................................. 213 6.4.2 RDFMappingsValidation .................................. 214 6.4.3 ValidatingLinkContributionswithSHACL ................... 215 6.4.4 OntologyValidationwithSHACL ........................... 216 6.5 ShExforShEx ................................................. 219 6.6 SHACLinSHACL............................................. 225 6.7 Summary ..................................................... 230 6.8 SuggestedReading .............................................. 231 7 ComparingShExandSHACL ......................................233 7.1 CommonFeatures .............................................. 233 7.2 SyntacticDifferences ............................................ 237 7.3 Foundation:Schemavs.Constraints ................................ 239 7.4 InvokingValidation ............................................. 240 7.5 ModularizationandReusability.................................... 242 7.6 Shapes,Classes,andInference..................................... 244 7.7 ViolationReportingandSeverities ................................. 246 7.8 DefaultCardinalities ............................................ 246 7.9 PropertyPaths ................................................. 247 7.10 Recursion ..................................................... 248 7.11 PropertyPairConstraintsandUniqueness ........................... 250 7.12 RepeatedProperties ............................................. 251 7.13 ExactlyOneandAlternatives ..................................... 254 7.14 TreatmentofClosedShapes ...................................... 257 7.15 StemsandStemRanges.......................................... 259 7.16 Annotations ................................................... 260 7.17 SemanticsandComplexity........................................ 261 7.18 ExtensionMechanisms .......................................... 262 7.19 ConclusionsandOutlook ........................................ 263 7.20 Summary ..................................................... 266 7.21 SuggestedReading .............................................. 266 A WebIndexinShEx ................................................267 B WebIndexinSHACL..............................................269 C ShExinShEx ....................................................275 D SHACLinSHACL ...............................................279 Bibliography .....................................................285 Preface This book describes two languages for implementing constraints on RDF data, describing the main features of both Shape Expressions (ShEx) and Shapes Constraint Language (SHACL) from a user perspective, and also offering a comparison of the technologies. Throughout this book,wedevelopa small number of examples that typify validation requirementsand demon- strate how they can be met with ShEx and SHACL. The book is not intended to be a formal specification of the languages, for which the interested reader can consult the corresponding referencedocuments,butrather,itismeanttoserveasanintroductiontothetechnologieswith somebackgroundabouttherationaleoftheirdesignandsomepointsofcomparison. Chapter 1 provides a brief introduction to the topic. Chapter 2 presents a short overviewoftheRDFdatamodelandRDF-relatedtechnologies;thischaptercouldbeskipped byanyreaderwhoalreadyknowsRDForTurtle.Chapter3helpsthereadertounderstandwhat toexpectfromdatavalidation.ItdescribestheproblemofRDFvalidationandsomeapproaches thathavebeenproposed.Thisbookspecificallyreviewstwooftheseapproachesinfurtherdetail: ShEx(Chapter4)andSHACL(Chapter5).Thesechaptersdescribeeachlanguageandprovide apracticalintroductionusingexamples.Followingthediscussionofbothlanguages,Chapter6 presents some applications using either ShEx, SHACL, or both. Finally, Chapter 7 compares ShExandSHACLandofferssomeconclusions. The goal of this book is to serve as a practical introduction to ShEx and SHACL using examples.Whileweomittedformaldefinitionsorspecifications,referencesforfurtherreading can be found at the end of each chapter. We give a quick overview of some background and related technologies so that readers without RDF knowledge can follow the book’s contents. Also,itisnotnecessarytohaveanypriorknowledgeofprogrammingorontologiestounderstand RDFvalidationtechnologies.Theintendedaudienceisanyoneinterestedindatarepresentation andquality. JoseEmilioLabraGayo,EricPrud’hommeaux,IovkaBoneva,andDimitrisKontokostas July2017

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.