ebook img

LC Classification as linked data PDF

0.14 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview LC Classification as linked data

Library of Congress Classification as linked data KevinFord What is linked data TheLibraryofCongresshaspublishedaselectnumberofclasses fromtheLibraryofCongressClassification(LCC)systemaslinked dataasanewofferingofitsLinkedDataService,1commonlyknown asid.loc.gov. Theoffering,whilestillconsideredabetaproject,pro- videsURIsforresourcesthatrepresentasimplifiedversionofthe underlyingdatafoundinthesourceMARCClassificationrecords. The beta service also furnishes URIs for classification number re- sourcesthateitherderivedirectlyfromtheunderlyingdataorare theresultofasynthesisbetweenascheduleresourceandatablere- source.AlthoughthedataarepresentedinMADS/RDF2andSKOS3 whereappropriate,LCCaslinkeddataisaccompaniedbyasmall LCC ontology to more accurately describe the types of classifica- tionresourcesandtherelationshipsbetweenthem,especiallywhere MADS/RDFandSKOSClassandPropertydefinitionswereseenas insufficient. ThispaperexploresthepublicationofLCCaslinked dataandtheaccompanyingontologybycontextualizingthemwith 1http://id.loc.gov. 2http://www.loc.gov/mads/rdf. 3http://www.w3.org/2004/02/skos. JLIS.it.Vol.4,n.1(Gennaio/January2013). DOI:10.4403/jlis.it-5465 K.Ford,LibraryofCongressClassificationaslinkeddata respecttoprioreffortsrepresentingLCCaslinkeddata,representing Deweyaslinkeddata,andtheappropriatenessofSKOSforlibrary classificationdata,especiallygiventhehistoricalneedforadistinct MARCformatforClassification. TheLibraryofCongressclassificationsystemhasexistedsincethe latenineteenthcentury“toorganizeandarrangethebookcollec- tionsoftheLibraryofCongress”(LibraryofCongressClassification). The system is organized into twenty-one classes, most of which arefurtherdividedintosubclasses. Eachclassrepresentsafieldof knowledge,suchasArt,Law,orHistory. Eachsubclassisfurther dividedintomorespecifictopicsthatbasicallyadheretoahierarchi- calrepresentationofthefieldofknowledge. Likemostclassification systems,LCCissubject-based. Theresulting“number”,therefore, representsadistincttopicwithinthefieldofknowledge.Fordecades LCC has been printed, bound, and distributed (at cost, basically) andstillistoday. Onemayacquire,foraprice,theentire41-volume setoronemaychooseindividualclassesorschedules. LCCisalso accessibleviaClassificationWeb,4whichisasophisticatedwebappli- cationdesignedtoassistcatalogerswiththeassignmentandcreation ofLCCclassificationnumbers. Itisofferedasasubscriptionservice for which LC charges a fee. Also for cost (basically), the Library of Congress Classification is available in MARC21 format and is madeavailableasabulkdownload, withperiodicupdates, from theLibrary’sCatalogingandDistributionService. Notably,theraw data, thoughavailable, requirespurchaseandisnotpresentedin accordancewithlinkeddatamethodsandprinciples. TheLibraryofCongressClassificationaslinkeddatadoeshavea history,albeitashortandlittleknownone. KarenCoylelaboriously scrapedthefirstfourlevels(moreorless)ofallLCClassification classesfromPDFdocumentshostedontheLCwebsitetoaplain 4https://classificationweb.net. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.162 JLIS.it.Vol.4,n.1(Gennaio/January2013) textfile(thatis,somethingfarmoreaccessibleformachines)and uploaded the resulting text file to archive.org.5 This work dates to, and therefore the data predates, September 2007.6 The PDF documents,whicharestillavailable(thoughperhapsupdatedsince), presentadetailedoutlineofLCC.EdSummersthentookthetextfile, generatedabasicSKOSRDFrepresentationfromit,anddevelopeda verysimplewebsitewherehepublishedtheSKOSdata.7 Thiswork waslittlepublicized,butitisstillactiveandaccessible. Summers’s codeisonGitHub.8 Coyle’stextfilesimplyliststheclasses(A,B,C,andsoon)andthe firstthreelevels,ifappropriate,ofeachsubclass(AC,AE,AG,and soon). Theconcept’slabelatanygivenlevelismatchedwiththe classnumber. BecauseonlythefirstfewlevelsofLCCareoutlined, mostclassificationnumbersrepresentarangeofmorespecifictopics. Missing–nearlyuniversally–fromthedetailedoutlinearelanguage- specific divisions within topics, temporal divisions within topics, andformdivisionswithintopics,inadditiontosimplygreatergran- ularity and specificity, such as the distinction between “General works”and“Specialtopics.” FromCoyle’stextfile,Summersgen- eratedaskos:ConceptResourceforeachclassificationnumberand associatedlabel.Hetookeachclassificationnumberandappendedit toabaseHTTPURI(inanamespacehecontrols)tocreateanunique identifierfortheresourceandhemadethelexicallabelforthetopic (andclassnumber)theskos:prefLabel. Hegeneratedskos:broader andskos:narrowerrelationshipsbetweenclassificationtopicswhen theclassificationnumberrepresentedanencompassingrangeora morespecificrangerespectively. Summerscreatedsomethingakin 5http://ia600304.us.archive.org/0/items/LcClassificationA-z/lc_class.txt. 6http://ia600304.us.archive.org/0/items/LcClassificationA-z/ LcClassificationA-z_meta.xml. 7http://inkdroid.org/lcco. 8https://github.com/edsu/lcco. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.163 K.Ford,LibraryofCongressClassificationaslinkeddata to an LCSH-like pre-coordinated heading with the labels of nar- rowertopics(i.e. thosethatfitcontextuallywithbroadertopics): the skos:prefLabelofnarrowertopicscontainsthelabelsofitsbroader relations, thelabelsofwhichareseparatedbytwohyphens. The datacollectedbyCoyle,whichmayhavebeenallthatwasreason- ablypossibletocollect,werelimitedtoaclassnumber,label,and hierarchy. ThefirstthreelevelsoftheDeweyDecimalClassification system–theDeweySummaries–havebeenavailableaslinkeddata since2009.9 OCLCpublishedthefullDeweyDecimalClassification aslinkeddatainSummer2012. AswithSummers’sdesign, each topicisaskos:Conceptwithbroaderornarrowerrelationstoany giventopic’shierarchicalrelatives.PublishedasitwasbyOCLC,the availabledataarericher,includinginformationaboutprovenance and licensing (no fewer than four statements for each Concept), creationandmodificationtimes,amongafewothers. UnlikeSum- mers’sdesign,OCLCreservedtheskos:prefLabelexclusivelyforthe lexicallabelofthegivenConcept–broaderrelationsarenotstrung togetherwiththetopic’slabeltocreatetheskos:prefLabel. OCLC’s URIdesignpatternswarrantspecialmention. Painshavebeentaken toembedsomesemanticsintotheURIpattern,reserving,essentially, one namespace each for “non-information resources (abstract or concretereal-worldobjects),genericresources,andtheirrepresenta- tions”(OCLC). AlthoughsomeoftheURIexamplesdonotappear tofunctionpresently,thefocusonURIcompositionandtheneed torepresentavarietyofdifferentresourcetypesbearsontherep- resentationofallaspectsofpublishingclassificationsystemssuch asDDCandLCCaslinkeddata.10 Adiversenumberofresource typesarealsoveryrelevanttoLCC.Inadditiontotheembedded semanticsintheDeweyURIs,thisissuereceivedgreaterelucidation 9http://dewey.info. 10Theactualserviceathttp://dewey.infofeaturesdiverseURIpatterns, allof whichappeartofunction,foralltypesofinformationresources. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.164 JLIS.it.Vol.4,n.1(Gennaio/January2013) byPanzerandZengintworelatedpublications(PanzerandZeng; Zeng,Panzer,andSalaba). Theauthorsexploredhowtomodelclassificationschemes(notably DDC) in SKOS. Among other findings, the authors discuss how classification systems include “assignable” and “non-assignable” concepts. InDDC,anexampleofanon-assignableconceptisacen- teredentry,oraclassificationnumberrangeorspanforwhichthere arelikelyanumberofmorespecifictopicsand,therefore,specific numbers. In LCC, this is referred to as a range. There is also the issue,asPanzerandZengnote(2009),ofsynthesizedconcepts(a classificationnumberandtopicthatarearesultofcombiningtwo conceptsintheclassificationsystem)andnon-synthesizedconcepts. Oneriskssomesemanticincoherencywhenattemptingtomodelall thesetypesofthings,andtoestablishappropriaterelationshipsbe- tweenthem,purelyinSKOS.PanzerandZengconsideredtheneed tocreate,minimally,anextensiontothecoreSKOSvocabulary,butit wasclearthatanaltogetherseparateattemptmightbenecessary,in anamespaceentirelydistinctfromaSKOSone,tocorrectlycapture thesemanticsandrelationships.Thesesameissuesalsomaterialized duringtheprocessoftryingtorepresentLCCinSKOS. SKOS – the Simple Knowledge Organization System – is de- signed“tosupporttheuseofknowledgeorganizationsystems(KOS) suchasthesauri, classificationschemes, subjectheadinglistsand taxonomieswithintheframeworkoftheSemanticWeb”.11 SKOS hasproventobeextremelyversatileandeffectiveatrepresenting thesauri,subjectheadinglists,andtaxonomies(though,inpartasa resultofbeingintentionallysimple,therecanbesomelossofgranu- laritywithrespecttolibrarydata). Infact,datarepresentedusing theMARCFormatforAuthoritydata,suchassubjectheadinglists likeLCSH,mapeffortlesslytoSKOS.Thisisseenreadilyandsimply 11http://www.w3.org/2004/02/skos. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.165 K.Ford,LibraryofCongressClassificationaslinkeddata whendecomposingaMARCAuthorityrecordintoMADS/RDFand SKOS.ForMARCAuthority,avalid(i.e. notdeprecated)authority recordistheConcept. The1XX-themainheading-becomesthe authoritativeorpreferredlabel. MADS/RDFprovidesameansto capturethetypeofconcept,beitaTopic,Geographic,GenreForm,or Temporalnotion,andafewothers. MADS/RDFalsoprovidessup- portforbetterrepresentationofpre-coordinatedheadings. MARC Authority4XXfieldsarevariantoralternatelabels. 5XXfieldsrep- resentvariousrelationshipsbetweenterms,ofwhichbroaderand narrowerrelationshipsarethemostpopular. MADS/RDFadded afewadditionalrelationships,suchasthoseneededtoaccurately recordconnectionsbetweenearlierorlaterestablishedconcepts,and anewresourcetypetoclearlydenotedeprecatedresources. Anum- berofnotefieldsdefinedinMARCAuthorityalsohaveone-to-one mappingstoMADS/RDFandSKOS.ButMADS/RDFandSKOS classesandpropertieshavebeenfarlessamenabletoclassification data, or at least to library-specific classification systems such as DDCandLCC.12ThisisessentiallythedifficultyPanzerandZeng encounteredduringtheirresearchanditisthesameencountered whenattemptingtopublishLCCaslinkeddata. Atleastwhenit comestolibraryclassificationsystemssuchasDDCandLCC,thisis unsurprising. The influential consideration here lies with the MARC21 format forClassification.13 Morespecifically,itsveryexistence. Formally butprovisionallypublishedinJune1990,theMARC21Formatfor 12Thisprobablyhastodoalottodowiththerelativecomplexityofclassification systems,especiallywithrespecttohowclassificationnumbersareconstructed,when comparedtothesaurior“subjectheadinglists;”theaggregateexpertiseoftheSKOS designersandmembersoftheworkinggroupwithrespecttoclassificationsystems; and,partlyasanaturalextensionofthepreviouspoint,acertainamountofpartiality andattentiongivento,andinfavorof,thesauriand“subjectheadinglists”during thedevelopmentofSKOS. 13http://www.loc.gov/marc/classification. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.166 JLIS.it.Vol.4,n.1(Gennaio/January2013) Classificationwasspecificallydevelopedtofacilitatetheexchange and printing of classification data, most notably LCC and DDC (Guenther). Importantly, the new MARC format was, however, theresultofanattempttomodifytheMARCformatforAuthority data(thisworkstartedin1987/1988). Afteridentifyingmostofthe changes that would be required of the MARC Authority format, a draft of the proposed changes was presented to the committee overseeingchangestotheMARCformats(MARBI).Followingthis review,andtheearlydevelopmentperiodgenerally,itwasclearthat “there was less overlap with the authority format than originally anticipated,and... [theMARCAuthority]codesandconventions weretooconstraining”(Guenther). Theproposalforclassification datawasrewrittentobeaseparateformat,whichwouldbecome theMARCFormatforClassificationby1991. TheMARCFormatforClassification–anditsdevelopmentprocess –tookintoconsiderationtheverysamesemanticdifficultiesencoun- teredbyPanzerandZeng,andthepresentauthor,whenfacedwith “skosifying”complexlibraryclassificationdata,andadifficultythat iscompoundedbytheunsuitablenatureoftheRDFdataelement semantics. TheMARCFormatforClassificationcanrepresentclass schedulesandtables,neitherofwhichisnecessarilyassignableasis. Theformatcanrepresentrangesandhierarchy. Naturally,ithasfull supportfornotesandindexterms. ButSKOSsemanticsarenotrich enoughthistypeofinformation. Thatsaid,SKOScanreasonably represent(assignable)classificationtopicsandevenclassnumber ranges.Itiswiththisinformationinmind,andthebackgroundwork byPanzerandZeng,thatitwasdecidedtopresentLCCaslinked dataasmuchaspossibleinMADS/RDFandSKOSbuttodefinea smallvocabularyinOWLtofaithfullyrepresentLCC-specificdata anddataelementswhereMADS/RDFandSKOSfallshort.14 14http://id.loc.gov/ontologies/lcc. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.167 K.Ford,LibraryofCongressClassificationaslinkeddata Although there are a few ontological constraints on the data, constraintsdonotpresentlyextendtohowthedataareused. For example, while it could be possible to infer “assignable” versus “non-assignable”resourcesfromtheintersectionofselectClasses in the ontology, this type of modeling has not been undertaken. As such, it is an experimental offering that attempts to make no semanticrestrictionsonitsusebutwhichstrivestorepresentthe derivedandunderlyingdataaccurately.Theontologyisalsospecific to LCC; it makes no attempt to model data elements specific to otherclassificationsystems, suchasDDC.Also, thoughitwould beunwisetoruleoutOCLCdevelopinganontologyforDDC,the explicitdeclarationofclassesinthesmallLCContologytransfersthe semanticsembeddedindewey.infoURIstothedataitself. (“Smart” URIsandcleardatasemanticsarenotmutuallyexclusiveandcould, infact,becomplementary.) AselectnumberofLibraryofCongress ClassificationclassesareavailablefromLC’slinkeddataService,15 commonlyknownasid.loc.gov.16 Thisoffering-atthetimeofthis publication - is very much a beta offering. During this stage, the dataanditsrepresentationaresubjecttochange,especiallyasmore islearnedabouthowthedataisusedandbetterwaysforittobe represented are determined or developed. Nevertheless, it is an attemptnotonlytopublishanRDFrepresentationoftheunderlying data used to construct classification numbers but also to publish the classification numbers themselves. To this end, an effort has beenmadetoapplythetablestoschedules,therebysynthesizinga classificationnumber,asappropriate. In order not to become too mired in MADS/RDF 17 and SKOS 18 semanticsandrestrictions,everythingisaMADS/RDFAuthority 15http://id.loc.gov. 16http://id.loc.gov/ontologies/lcc.html. 17http://www.loc.gov/mads/rdf. 18http://www.w3.org/2004/02/skos. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.168 JLIS.it.Vol.4,n.1(Gennaio/January2013) andSKOSConcept,withtheexceptionofIndexTerms,whichcan beinterpretedasvariants. TheyarethereforeinstantiatedasMAD- S/RDFVariantsandSKOS/XLAlternateLabels. Theauthoritative label-thepreferredlabelandthetightlycontrolledterm-isreserved forthemaincaptionorterm. ThisisthereforesimilartohowOCLC createdDeweyresourcesandadeparturefromhowSummerspre- sentedthedata. Thefulllexicallyrepresentedhierarchythatone findsinthesourceMARCrecordsisrecordedsimplyasanrdfs:label so that it is still available for parsing and potentially for display purposes. The classes and properties in the LCC ontology, there- fore,aretherealcarriersofdistinctionbetweenLibraryofCongress Classificationresourcespublishedatid.loc.gov.19 TheLCContology provides a way to describe the “underlying data,” which is a ref- erencetothedataonewouldfindinaMARCclassificationrecord. DataintheMARCclassificationrecordincludeinformationabout classification-specificresourcetypessuchastablesandschedules, anddatadescribedetailsabouthowtoapplytablenumberstobase numberstogenerateandassignableclassificationnumber. Assuch, theLCContologydefinesClassesandPropertiessufficientenough toaccuratelyrepresentLCCdatainRDFandsufficientenoughto synthesize class numbers from schedules when and however ap- propriate. TheontologyisasignificantsimplificationoftheMARC Classificationcodes,dataelementdefinitions,andconventions. One suchsimplificationtouchesontheidentificationofdifferenttypesof rangesdefinedinMARCClassification. Becausethereappearstobe nomeaningfuldistinctionbetweenaMARCSummaryRangeand MARCDefinedRangewithrespecttotheirrepresentationinRDF, specificallyforLCC,thesetypesaresimplyanLCCRange. Onthe 19Ihaveendeavoredtocapitalizetheword“Class”(andProperty)whenreferring toanOWLorRDFClass(orProperty).Wheneverreferencinganentityassociated directlywithLCC-suchasclassificationnumber,LCCclass,classschedule,orclass number-Ihavepresentedthewordinalllowercaseletters. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.169 K.Ford,LibraryofCongressClassificationaslinkeddata otherhand,itwasdeemednecessarytodefineanadditionalTable type-aGuideTable-wheretheMARCClassificationformatmade nocleardistinctionbetweenthetwo. AGuideTableishierarchically thebroadesttableconceptandcarriestheTableRule,whichisthe instructionneededtosynthesizeaclassificationnumberbetweenan LCCScheduleandanLCCTable. ThesmallLCContologyincludes ClassesforaSchedule,Range,Table,GuideTable,andTableRule, allofwhicharetypesofresourcesthataresomewhatuniquetoclas- sificationschemes. Additionally, classification-specificproperties havebeendefinedthatrelatetheseclassestoeachother,suchasone thatrelatesaTabletoitsGuideTableoranotherthatrelatesaGuide TabletooneormoreSchedules,towhichtheGuideTablemayapply. Atallothertimes,MADS/RDF,whichisfullymappedtoSKOS,is employed(alldataare,ofcourse,alsooutputtedasSKOS).Naturally, theseTable,GuideTable,andScheduleresourcesare“underlying data”andaregenerallyconsideredtobe“non-assignable,”thatis theyareresourcesthatshouldnotbeusedtodescribeanotherre- source,suchasabibliographicone. Becausetheseresourcesoften haveaone-to-onerelationshipwithanunderlyingMARCClassifi- cationrecord,theLCCNoftheunderlyingrecordhasbeenusedas partoftheURIscheme. AnLCCNthatbeginswith“CF”represents aschedule;onethatbegins“CT”representsaGuideTableorTable. However,whenclassificationresourcesaredescribedwiththeClass- NumberOWLClass,theresourcecouldbedescribedasassignable. TheURIsfortheseresourcesendinaclassificationnumberorrange. AClassNumberresourcemaybeanLCCRangeoraMADS/RDF Topic. Theformer-anLCCRange-generallyrepresentsagroupof conceptshierarchicallyrelatedtothebroaderconceptrepresented bytherange.Ofcourse,rangesarenotassignablewhentraditionally assigningclassificationnumberstophysicalbibliographicresources. MADS/RDFTopicwasusedwhentheresourcerepresentedasingle, JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.170

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.