Library of Congress Classification as linked data KevinFord What is linked data TheLibraryofCongresshaspublishedaselectnumberofclasses fromtheLibraryofCongressClassification(LCC)systemaslinked dataasanewofferingofitsLinkedDataService,1commonlyknown asid.loc.gov. Theoffering,whilestillconsideredabetaproject,pro- videsURIsforresourcesthatrepresentasimplifiedversionofthe underlyingdatafoundinthesourceMARCClassificationrecords. The beta service also furnishes URIs for classification number re- sourcesthateitherderivedirectlyfromtheunderlyingdataorare theresultofasynthesisbetweenascheduleresourceandatablere- source.AlthoughthedataarepresentedinMADS/RDF2andSKOS3 whereappropriate,LCCaslinkeddataisaccompaniedbyasmall LCC ontology to more accurately describe the types of classifica- tionresourcesandtherelationshipsbetweenthem,especiallywhere MADS/RDFandSKOSClassandPropertydefinitionswereseenas insufficient. ThispaperexploresthepublicationofLCCaslinked dataandtheaccompanyingontologybycontextualizingthemwith 1http://id.loc.gov. 2http://www.loc.gov/mads/rdf. 3http://www.w3.org/2004/02/skos. JLIS.it.Vol.4,n.1(Gennaio/January2013). DOI:10.4403/jlis.it-5465 K.Ford,LibraryofCongressClassificationaslinkeddata respecttoprioreffortsrepresentingLCCaslinkeddata,representing Deweyaslinkeddata,andtheappropriatenessofSKOSforlibrary classificationdata,especiallygiventhehistoricalneedforadistinct MARCformatforClassification. TheLibraryofCongressclassificationsystemhasexistedsincethe latenineteenthcentury“toorganizeandarrangethebookcollec- tionsoftheLibraryofCongress”(LibraryofCongressClassification). The system is organized into twenty-one classes, most of which arefurtherdividedintosubclasses. Eachclassrepresentsafieldof knowledge,suchasArt,Law,orHistory. Eachsubclassisfurther dividedintomorespecifictopicsthatbasicallyadheretoahierarchi- calrepresentationofthefieldofknowledge. Likemostclassification systems,LCCissubject-based. Theresulting“number”,therefore, representsadistincttopicwithinthefieldofknowledge.Fordecades LCC has been printed, bound, and distributed (at cost, basically) andstillistoday. Onemayacquire,foraprice,theentire41-volume setoronemaychooseindividualclassesorschedules. LCCisalso accessibleviaClassificationWeb,4whichisasophisticatedwebappli- cationdesignedtoassistcatalogerswiththeassignmentandcreation ofLCCclassificationnumbers. Itisofferedasasubscriptionservice for which LC charges a fee. Also for cost (basically), the Library of Congress Classification is available in MARC21 format and is madeavailableasabulkdownload, withperiodicupdates, from theLibrary’sCatalogingandDistributionService. Notably,theraw data, thoughavailable, requirespurchaseandisnotpresentedin accordancewithlinkeddatamethodsandprinciples. TheLibraryofCongressClassificationaslinkeddatadoeshavea history,albeitashortandlittleknownone. KarenCoylelaboriously scrapedthefirstfourlevels(moreorless)ofallLCClassification classesfromPDFdocumentshostedontheLCwebsitetoaplain 4https://classificationweb.net. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.162 JLIS.it.Vol.4,n.1(Gennaio/January2013) textfile(thatis,somethingfarmoreaccessibleformachines)and uploaded the resulting text file to archive.org.5 This work dates to, and therefore the data predates, September 2007.6 The PDF documents,whicharestillavailable(thoughperhapsupdatedsince), presentadetailedoutlineofLCC.EdSummersthentookthetextfile, generatedabasicSKOSRDFrepresentationfromit,anddevelopeda verysimplewebsitewherehepublishedtheSKOSdata.7 Thiswork waslittlepublicized,butitisstillactiveandaccessible. Summers’s codeisonGitHub.8 Coyle’stextfilesimplyliststheclasses(A,B,C,andsoon)andthe firstthreelevels,ifappropriate,ofeachsubclass(AC,AE,AG,and soon). Theconcept’slabelatanygivenlevelismatchedwiththe classnumber. BecauseonlythefirstfewlevelsofLCCareoutlined, mostclassificationnumbersrepresentarangeofmorespecifictopics. Missing–nearlyuniversally–fromthedetailedoutlinearelanguage- specific divisions within topics, temporal divisions within topics, andformdivisionswithintopics,inadditiontosimplygreatergran- ularity and specificity, such as the distinction between “General works”and“Specialtopics.” FromCoyle’stextfile,Summersgen- eratedaskos:ConceptResourceforeachclassificationnumberand associatedlabel.Hetookeachclassificationnumberandappendedit toabaseHTTPURI(inanamespacehecontrols)tocreateanunique identifierfortheresourceandhemadethelexicallabelforthetopic (andclassnumber)theskos:prefLabel. Hegeneratedskos:broader andskos:narrowerrelationshipsbetweenclassificationtopicswhen theclassificationnumberrepresentedanencompassingrangeora morespecificrangerespectively. Summerscreatedsomethingakin 5http://ia600304.us.archive.org/0/items/LcClassificationA-z/lc_class.txt. 6http://ia600304.us.archive.org/0/items/LcClassificationA-z/ LcClassificationA-z_meta.xml. 7http://inkdroid.org/lcco. 8https://github.com/edsu/lcco. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.163 K.Ford,LibraryofCongressClassificationaslinkeddata to an LCSH-like pre-coordinated heading with the labels of nar- rowertopics(i.e. thosethatfitcontextuallywithbroadertopics): the skos:prefLabelofnarrowertopicscontainsthelabelsofitsbroader relations, thelabelsofwhichareseparatedbytwohyphens. The datacollectedbyCoyle,whichmayhavebeenallthatwasreason- ablypossibletocollect,werelimitedtoaclassnumber,label,and hierarchy. ThefirstthreelevelsoftheDeweyDecimalClassification system–theDeweySummaries–havebeenavailableaslinkeddata since2009.9 OCLCpublishedthefullDeweyDecimalClassification aslinkeddatainSummer2012. AswithSummers’sdesign, each topicisaskos:Conceptwithbroaderornarrowerrelationstoany giventopic’shierarchicalrelatives.PublishedasitwasbyOCLC,the availabledataarericher,includinginformationaboutprovenance and licensing (no fewer than four statements for each Concept), creationandmodificationtimes,amongafewothers. UnlikeSum- mers’sdesign,OCLCreservedtheskos:prefLabelexclusivelyforthe lexicallabelofthegivenConcept–broaderrelationsarenotstrung togetherwiththetopic’slabeltocreatetheskos:prefLabel. OCLC’s URIdesignpatternswarrantspecialmention. Painshavebeentaken toembedsomesemanticsintotheURIpattern,reserving,essentially, one namespace each for “non-information resources (abstract or concretereal-worldobjects),genericresources,andtheirrepresenta- tions”(OCLC). AlthoughsomeoftheURIexamplesdonotappear tofunctionpresently,thefocusonURIcompositionandtheneed torepresentavarietyofdifferentresourcetypesbearsontherep- resentationofallaspectsofpublishingclassificationsystemssuch asDDCandLCCaslinkeddata.10 Adiversenumberofresource typesarealsoveryrelevanttoLCC.Inadditiontotheembedded semanticsintheDeweyURIs,thisissuereceivedgreaterelucidation 9http://dewey.info. 10Theactualserviceathttp://dewey.infofeaturesdiverseURIpatterns, allof whichappeartofunction,foralltypesofinformationresources. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.164 JLIS.it.Vol.4,n.1(Gennaio/January2013) byPanzerandZengintworelatedpublications(PanzerandZeng; Zeng,Panzer,andSalaba). Theauthorsexploredhowtomodelclassificationschemes(notably DDC) in SKOS. Among other findings, the authors discuss how classification systems include “assignable” and “non-assignable” concepts. InDDC,anexampleofanon-assignableconceptisacen- teredentry,oraclassificationnumberrangeorspanforwhichthere arelikelyanumberofmorespecifictopicsand,therefore,specific numbers. In LCC, this is referred to as a range. There is also the issue,asPanzerandZengnote(2009),ofsynthesizedconcepts(a classificationnumberandtopicthatarearesultofcombiningtwo conceptsintheclassificationsystem)andnon-synthesizedconcepts. Oneriskssomesemanticincoherencywhenattemptingtomodelall thesetypesofthings,andtoestablishappropriaterelationshipsbe- tweenthem,purelyinSKOS.PanzerandZengconsideredtheneed tocreate,minimally,anextensiontothecoreSKOSvocabulary,butit wasclearthatanaltogetherseparateattemptmightbenecessary,in anamespaceentirelydistinctfromaSKOSone,tocorrectlycapture thesemanticsandrelationships.Thesesameissuesalsomaterialized duringtheprocessoftryingtorepresentLCCinSKOS. SKOS – the Simple Knowledge Organization System – is de- signed“tosupporttheuseofknowledgeorganizationsystems(KOS) suchasthesauri, classificationschemes, subjectheadinglistsand taxonomieswithintheframeworkoftheSemanticWeb”.11 SKOS hasproventobeextremelyversatileandeffectiveatrepresenting thesauri,subjectheadinglists,andtaxonomies(though,inpartasa resultofbeingintentionallysimple,therecanbesomelossofgranu- laritywithrespecttolibrarydata). Infact,datarepresentedusing theMARCFormatforAuthoritydata,suchassubjectheadinglists likeLCSH,mapeffortlesslytoSKOS.Thisisseenreadilyandsimply 11http://www.w3.org/2004/02/skos. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.165 K.Ford,LibraryofCongressClassificationaslinkeddata whendecomposingaMARCAuthorityrecordintoMADS/RDFand SKOS.ForMARCAuthority,avalid(i.e. notdeprecated)authority recordistheConcept. The1XX-themainheading-becomesthe authoritativeorpreferredlabel. MADS/RDFprovidesameansto capturethetypeofconcept,beitaTopic,Geographic,GenreForm,or Temporalnotion,andafewothers. MADS/RDFalsoprovidessup- portforbetterrepresentationofpre-coordinatedheadings. MARC Authority4XXfieldsarevariantoralternatelabels. 5XXfieldsrep- resentvariousrelationshipsbetweenterms,ofwhichbroaderand narrowerrelationshipsarethemostpopular. MADS/RDFadded afewadditionalrelationships,suchasthoseneededtoaccurately recordconnectionsbetweenearlierorlaterestablishedconcepts,and anewresourcetypetoclearlydenotedeprecatedresources. Anum- berofnotefieldsdefinedinMARCAuthorityalsohaveone-to-one mappingstoMADS/RDFandSKOS.ButMADS/RDFandSKOS classesandpropertieshavebeenfarlessamenabletoclassification data, or at least to library-specific classification systems such as DDCandLCC.12ThisisessentiallythedifficultyPanzerandZeng encounteredduringtheirresearchanditisthesameencountered whenattemptingtopublishLCCaslinkeddata. Atleastwhenit comestolibraryclassificationsystemssuchasDDCandLCC,thisis unsurprising. The influential consideration here lies with the MARC21 format forClassification.13 Morespecifically,itsveryexistence. Formally butprovisionallypublishedinJune1990,theMARC21Formatfor 12Thisprobablyhastodoalottodowiththerelativecomplexityofclassification systems,especiallywithrespecttohowclassificationnumbersareconstructed,when comparedtothesaurior“subjectheadinglists;”theaggregateexpertiseoftheSKOS designersandmembersoftheworkinggroupwithrespecttoclassificationsystems; and,partlyasanaturalextensionofthepreviouspoint,acertainamountofpartiality andattentiongivento,andinfavorof,thesauriand“subjectheadinglists”during thedevelopmentofSKOS. 13http://www.loc.gov/marc/classification. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.166 JLIS.it.Vol.4,n.1(Gennaio/January2013) Classificationwasspecificallydevelopedtofacilitatetheexchange and printing of classification data, most notably LCC and DDC (Guenther). Importantly, the new MARC format was, however, theresultofanattempttomodifytheMARCformatforAuthority data(thisworkstartedin1987/1988). Afteridentifyingmostofthe changes that would be required of the MARC Authority format, a draft of the proposed changes was presented to the committee overseeingchangestotheMARCformats(MARBI).Followingthis review,andtheearlydevelopmentperiodgenerally,itwasclearthat “there was less overlap with the authority format than originally anticipated,and... [theMARCAuthority]codesandconventions weretooconstraining”(Guenther). Theproposalforclassification datawasrewrittentobeaseparateformat,whichwouldbecome theMARCFormatforClassificationby1991. TheMARCFormatforClassification–anditsdevelopmentprocess –tookintoconsiderationtheverysamesemanticdifficultiesencoun- teredbyPanzerandZeng,andthepresentauthor,whenfacedwith “skosifying”complexlibraryclassificationdata,andadifficultythat iscompoundedbytheunsuitablenatureoftheRDFdataelement semantics. TheMARCFormatforClassificationcanrepresentclass schedulesandtables,neitherofwhichisnecessarilyassignableasis. Theformatcanrepresentrangesandhierarchy. Naturally,ithasfull supportfornotesandindexterms. ButSKOSsemanticsarenotrich enoughthistypeofinformation. Thatsaid,SKOScanreasonably represent(assignable)classificationtopicsandevenclassnumber ranges.Itiswiththisinformationinmind,andthebackgroundwork byPanzerandZeng,thatitwasdecidedtopresentLCCaslinked dataasmuchaspossibleinMADS/RDFandSKOSbuttodefinea smallvocabularyinOWLtofaithfullyrepresentLCC-specificdata anddataelementswhereMADS/RDFandSKOSfallshort.14 14http://id.loc.gov/ontologies/lcc. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.167 K.Ford,LibraryofCongressClassificationaslinkeddata Although there are a few ontological constraints on the data, constraintsdonotpresentlyextendtohowthedataareused. For example, while it could be possible to infer “assignable” versus “non-assignable”resourcesfromtheintersectionofselectClasses in the ontology, this type of modeling has not been undertaken. As such, it is an experimental offering that attempts to make no semanticrestrictionsonitsusebutwhichstrivestorepresentthe derivedandunderlyingdataaccurately.Theontologyisalsospecific to LCC; it makes no attempt to model data elements specific to otherclassificationsystems, suchasDDC.Also, thoughitwould beunwisetoruleoutOCLCdevelopinganontologyforDDC,the explicitdeclarationofclassesinthesmallLCContologytransfersthe semanticsembeddedindewey.infoURIstothedataitself. (“Smart” URIsandcleardatasemanticsarenotmutuallyexclusiveandcould, infact,becomplementary.) AselectnumberofLibraryofCongress ClassificationclassesareavailablefromLC’slinkeddataService,15 commonlyknownasid.loc.gov.16 Thisoffering-atthetimeofthis publication - is very much a beta offering. During this stage, the dataanditsrepresentationaresubjecttochange,especiallyasmore islearnedabouthowthedataisusedandbetterwaysforittobe represented are determined or developed. Nevertheless, it is an attemptnotonlytopublishanRDFrepresentationoftheunderlying data used to construct classification numbers but also to publish the classification numbers themselves. To this end, an effort has beenmadetoapplythetablestoschedules,therebysynthesizinga classificationnumber,asappropriate. In order not to become too mired in MADS/RDF 17 and SKOS 18 semanticsandrestrictions,everythingisaMADS/RDFAuthority 15http://id.loc.gov. 16http://id.loc.gov/ontologies/lcc.html. 17http://www.loc.gov/mads/rdf. 18http://www.w3.org/2004/02/skos. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.168 JLIS.it.Vol.4,n.1(Gennaio/January2013) andSKOSConcept,withtheexceptionofIndexTerms,whichcan beinterpretedasvariants. TheyarethereforeinstantiatedasMAD- S/RDFVariantsandSKOS/XLAlternateLabels. Theauthoritative label-thepreferredlabelandthetightlycontrolledterm-isreserved forthemaincaptionorterm. ThisisthereforesimilartohowOCLC createdDeweyresourcesandadeparturefromhowSummerspre- sentedthedata. Thefulllexicallyrepresentedhierarchythatone findsinthesourceMARCrecordsisrecordedsimplyasanrdfs:label so that it is still available for parsing and potentially for display purposes. The classes and properties in the LCC ontology, there- fore,aretherealcarriersofdistinctionbetweenLibraryofCongress Classificationresourcespublishedatid.loc.gov.19 TheLCContology provides a way to describe the “underlying data,” which is a ref- erencetothedataonewouldfindinaMARCclassificationrecord. DataintheMARCclassificationrecordincludeinformationabout classification-specificresourcetypessuchastablesandschedules, anddatadescribedetailsabouthowtoapplytablenumberstobase numberstogenerateandassignableclassificationnumber. Assuch, theLCContologydefinesClassesandPropertiessufficientenough toaccuratelyrepresentLCCdatainRDFandsufficientenoughto synthesize class numbers from schedules when and however ap- propriate. TheontologyisasignificantsimplificationoftheMARC Classificationcodes,dataelementdefinitions,andconventions. One suchsimplificationtouchesontheidentificationofdifferenttypesof rangesdefinedinMARCClassification. Becausethereappearstobe nomeaningfuldistinctionbetweenaMARCSummaryRangeand MARCDefinedRangewithrespecttotheirrepresentationinRDF, specificallyforLCC,thesetypesaresimplyanLCCRange. Onthe 19Ihaveendeavoredtocapitalizetheword“Class”(andProperty)whenreferring toanOWLorRDFClass(orProperty).Wheneverreferencinganentityassociated directlywithLCC-suchasclassificationnumber,LCCclass,classschedule,orclass number-Ihavepresentedthewordinalllowercaseletters. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.169 K.Ford,LibraryofCongressClassificationaslinkeddata otherhand,itwasdeemednecessarytodefineanadditionalTable type-aGuideTable-wheretheMARCClassificationformatmade nocleardistinctionbetweenthetwo. AGuideTableishierarchically thebroadesttableconceptandcarriestheTableRule,whichisthe instructionneededtosynthesizeaclassificationnumberbetweenan LCCScheduleandanLCCTable. ThesmallLCContologyincludes ClassesforaSchedule,Range,Table,GuideTable,andTableRule, allofwhicharetypesofresourcesthataresomewhatuniquetoclas- sificationschemes. Additionally, classification-specificproperties havebeendefinedthatrelatetheseclassestoeachother,suchasone thatrelatesaTabletoitsGuideTableoranotherthatrelatesaGuide TabletooneormoreSchedules,towhichtheGuideTablemayapply. Atallothertimes,MADS/RDF,whichisfullymappedtoSKOS,is employed(alldataare,ofcourse,alsooutputtedasSKOS).Naturally, theseTable,GuideTable,andScheduleresourcesare“underlying data”andaregenerallyconsideredtobe“non-assignable,”thatis theyareresourcesthatshouldnotbeusedtodescribeanotherre- source,suchasabibliographicone. Becausetheseresourcesoften haveaone-to-onerelationshipwithanunderlyingMARCClassifi- cationrecord,theLCCNoftheunderlyingrecordhasbeenusedas partoftheURIscheme. AnLCCNthatbeginswith“CF”represents aschedule;onethatbegins“CT”representsaGuideTableorTable. However,whenclassificationresourcesaredescribedwiththeClass- NumberOWLClass,theresourcecouldbedescribedasassignable. TheURIsfortheseresourcesendinaclassificationnumberorrange. AClassNumberresourcemaybeanLCCRangeoraMADS/RDF Topic. Theformer-anLCCRange-generallyrepresentsagroupof conceptshierarchicallyrelatedtothebroaderconceptrepresented bytherange.Ofcourse,rangesarenotassignablewhentraditionally assigningclassificationnumberstophysicalbibliographicresources. MADS/RDFTopicwasusedwhentheresourcerepresentedasingle, JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5465 p.170