Trust and persistence for internet resources MaurizioLunghi,ChiaraCirinnà EmanueleBellini Introduction Internet radically changed our way of working, communicating, living,producingandaccessinginformation,interactingwithinsti- tutionsandbodies,buyingthingsandmanagingresources. Now everything is available on an open and flexible infrastructure, of- tenfreelyaccessibletoalltheusers: contentsareusablebymany servicestailoredtotheuserrequirements. Thewebhasprobably beenthekillerapplicationfortheinternet. Inthepastfewyears,the webmovedfromawebofdocumentstowardsawebofdatawhere informationisnomorepackagedinfixeddocumentsbutisavailable inade-structuredwayandusableinamoreflexiblewaybyusers. Therecentdevelopmentsonthewebwitnessedtheemergenceof thesemanticwebtechnologiesandthelinkedopendata1approach, associatedwithanincreasinglylargeamountofdataavailablefor publishingandconnectingstructureddataontheweb. Linkeddata bestpractices,supportedbyW3C,2 arenowreadytobeendorsed 1http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/ LinkingOpenData. 2W3C-http://www.w3c.it. JLIS.it.Vol.4,n.1(Gennaio/January2013). DOI:10.4403/jlis.it-5494 M.Lunghi,Trustandpersistenceforinternetresources byarelevantnumberofdataproviders,leadingtothecreationofa globaldataspace-thewebofdata. Unfortunately,theLOD5stars3 aremainlyorientedtowardstheusabilityandstandardizationof datapublishedonthewebwithoutconsideringthetrustabilityand persistenceofthedataandtheURIusedtorefertothem. Infact the objective of the LOD approach seems to be oriented to make ahugenumberofdataonthewebaccessibleinanon-proprietary format(e.g. CSVinsteadofExcel)andtolinkthesedatatoother datasets(e.g. Genomes4orDBpedia5)todisambiguatecontentand toprovideacontext. However,insomecases,andespeciallyinthe cultural and educational domains, besides retrieving the needed dataortheirrelations,itisalsoequallyimportanttogetinformation abouttheirauthenticity,integrityandprovenance. Systemsforcerti- ficationusingPIsfordigitalobjects,forauthorsandforinstitutions can be of great help in order to refine the quality of information retrievablefrominternetandtolargelyincreaseitsusabilityandthe developmentofpotentialnewservices. Thisparadigmbasedonthe identificationandinterconnectionofdataofferssolutionstomany oftheactuallibraryissues,likeenhancedwebsearching,authority control,classification,dataportabilityanddisambiguation. Inthe webofdocumentsidentificationandtrustwereprovidedbyweb sitesandinstitutionssupportingthem,inthewebofdatatheyare integratedinthesinglepieceofdata.Theevolutionofthisparadigm isincreasinglyimportantinavisionforthelongtermcurationof thedigitalresources. 3http://www.w3.org/DesignIssues/LinkedData.html. 4http://www.geonames.org/ontology/documentation.html. 5http://en.wikipedia.org/wiki/DBpedia. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.376 JLIS.it.Vol.4,n.1(Gennaio/January2013) Requirements for the long term curation of digital resources Presentlythenumberofscientificandculturalheritagedigital resourcesmadeavailableontheinternetthroughoutdigitallibrary applications is constantly growing and it is now crucial to guar- anteepersistency,authority,reliabilityandwidedisseminationof resources while supporting their long term curation. One of the mainrequirementstotacklethisissueistoadoptcredibleandPI systems within the life cycle of these resources. A PI should be assignedonlytoresourcesthatarestable,significantfortherelated user community and suitable with the scope of the identification system. Anumberofinitiatives,standards,technologiesareavail- able,butitmaybedifficultforaninstitutiontounderstandwhich ofthesearemoreappropriatefortheirdigitalobjects. ThePItech- nologieshelpmakestablethereferencetoadigitalresource,even ifitiswell-knownthatpersistencyisn’tonlyatechnicalissue. In factthesetechnologiesarenotobviouslyreliableperse,notechnol- ogycanexistindefinitelyorguaranteeserviceswithoutatrustable organizationandclearlydefinedpolicies. InourvisionPIsystems aremeantastheavailabletechnologyplusatrustableorganization and precise policies for digital preservation, implemented by the managers of the related user community. The concept of persis- tence moves from the commitment of an institution/registration authoritytoacommitmentoftheentireusercommunityservedby PI.APIsystemcanbeconsideredasacontractbetweenthefinal usersandtheservice-providersresponsiblefortheimplementation andmaintenanceofthePI-serviceandthefunctionalityofthesys- tem. Fromthispointofview,thepersistenceofaPIdependsalso onthecommitmentofthecommunitythatpromotesandusesthe identificationsystemfortheirownresources. Thishappenswhen JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.377 M.Lunghi,Trustandpersistenceforinternetresources thestandardadoptediseffectivelyorientedtothecommunityre- quirements and the authority in charge to manage the system is recognizedbythecommunityitself. Itiswellknownthatthestruc- turalinstabilityofsimpleURLs(e.g. domainsnolongeravailable) and related resources (relocation or updating) is one of the main issuesthatpreventstheuseofinternetasatrustworthyplatformfor theresearchandthedisseminationofdigitalcontents. Thecurrent use of the simple URL approach used as persistent digital object identifierbringsmanyanddocumentedrisksinalongtermvision notonlyforretrievalandaccessofresourcesbutalsowithrespectto thelossofreferencetothedigitaldocumentsorthelackofguarantee ofauthorityandprovenance. Theserisksaffect: a)theculturalheritageandresearchdomains,preventingtheim- plementationofreliablecitabilityservices,researchevaluation, digitalpreservation,access,etc., b)thebusinessdomain,preventingtheuseofpurchaseservices providedontheseobjects, c)thepublicdomain(e-gov),slowingdownthedematerialization processofpublicadministrations. ItisclearthattheproblemisnotonlytofacetheHTTP404error, but it is moving towards identification systems able to support authority, reliability, preservation, certification, exploitation and widedisseminationoftheseresources. Atrustworthysolutionisto associateatrustedPIwiththedigitalresources. The challenge of trust Trust,broadlyspeaking,concernstheassessmentandmanagement oftherisksperceivedbyeachactorenteringintoarelationship. In JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.378 JLIS.it.Vol.4,n.1(Gennaio/January2013) otherwords,”trustentailsrisk”.AccordingtotheISOdefinition,the riskcanbedefinedasthecombinationoftheprobabilityofanevent anditsconsequences(ISO/IECGuide73). Thereareanumberof eventswithbadconsequencesthatcouldoccurduringthelifetime ofthePIservice,withdifferentdegreesofprobabilitybutallwith highcostsincaseoffailure. Examplesoftheserisksare: a)failingtodeterminetheinitialandrecurringcostsandthepricing ofservice(risksassociatedtothefinancialsustainability), b)adoptingtechnologiesnolongeravailable(riskassociatedtothe standardsadoption), c)theobjectidentifiedisnolongeravailableonthenetwork(riskas- sociatedtotheagreementbetweencontentandserviceproviders), d) to lose the support of the community (risk associated to the communitymandate),etc. Thesefactorscandeterminatethedecrease(lowering)oftrustwor- thinessinthePIservicebythecontentproviderandaffectthedis- seminationandexploitationofdigitalresources. Thevariousdigital repositories store intangible objects and entities and make them availabletousersthroughtelematicsnetworks: weaccessourbank account as well the hospital or the municipality for official docu- ments,wedownloadtonsoffilesandchatwithavataractors. But whocertifiestheidentityofactorsandguaranteesourprivacy?How can we rely on the authenticity of the documents we download? Andalsohowcanwetrusttheinstituteissuingan‘official’docu- ment? Whatistheriskifwecannotdemonstratethatadocumentis notvalidforourexpectedpurposes? Whicharetherisks? Agood amount of trustworthiness is necessary to live in this virtual and artificialworld. APIservicemustaddressatleastthefollowingcore requirements: JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.379 M.Lunghi,Trustandpersistenceforinternetresources 1. globaluniqueness: thePIisclearlypartofanamedomainand itisuniqueandassociatetoauniqueresource. 2. persistence: itreferstothepermanentlifetimethesignificant properties of an identifier, for example, it is not possible to reassignthePItootherresourcesortodeleteit. 3. resolvability: itreferstothepossibilityofretrievinginforma- tionregardingaresourceortoaccessitdirectlyontheinternet. Currently, there are different technologies and standards for the implementationofPIsystems,butthereisn’tageneralagreement ontheiradoption,oftenbecausesomeofthesesystemswerebornas technicalsolutions,withoutthesupportofthecommunityofusers whoneedspecificlevelsofPIservices. SystemslikethePURLor CoolURIs(Berners-Lee)haveconsiderableadvantagesinsupporting thewebofdataimplementationthankstothetheirimmediatede- referenceabilitythroughtheprotocolHTTP,butontheotherhand, thereareseverallimitationsduetothefactthattheirpersistenceis notguaranteedinprinciplebyanindependentandtrustablethird party. ItiswellknownthattheCoolURIapproachtopersistence is based on the URL design. This approach, even if it is consid- eredabestpracticefortheimplementationofthesemanticwebin generalandlinkeddatainparticular,ismainlybasedontechnical solutions. ThebasicassumptionisthatacorrectdesignoftheURI should reduce the need to change them in order to ensure their stabilityovertime. Anexampleofthisbestpracticeistoavoidthe explicitextensionofwebpagesas.phpor.aspsothatchangesin technologyimplementationdonotaffecttheURIform(e.g. from PHPtoASP).Inthisperspective,thepersistenceisbaseduniquely onthecommitmentofindividualinstitutionsestablishingatrusted relationshipdirectlywiththefinalusers,withoutthemediationof athirdparty. Unfortunately,itiswellknownthatthecommitment JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.380 JLIS.it.Vol.4,n.1(Gennaio/January2013) ofasingleinstitutionisnolongersufficienttoensureneitherlong termpersistenceofURLsnorthetrustworthinessoftheresources in terms of provenance, authenticity, integrity, conservation, and soon. Inpracticeresourcesaremovingonthenetwork,theycan be changed or deleted due to a multitude of factors that cannot alwaysbepredeterminedorregulatedbythecontentmanagement policiesofinstitutionsorgovernedbybestpracticetechniques. A typicalcaseoccurswhenaninstitutionrunsoutbecauseithasbeen absorbedbyanotherinstitution, oritissuppressed, orsimplyits officialnamehaschanged. Inthesecases,thedigitalobjectscanbe renamedtobeadaptedtotheworkflowofthenewinstitution,or transferredtootherinstitutions, oratworstdeletedbecausethey arenolongerrelevanttoinstitutionalgoals. Itisclearthatallthese actions can cause the breaking of the old URLs independently of howtheywerebuilt. Thismaynotbeaproblemiftheinstitution doesnothandlescientific,culturaloradministrativeresourcesbut it becomes a critical issue if these changes affect institutions like scientificdatastore,libraries,archives,governmentaldataset,andso forth. Inthesecases,forexample,bibliographiesbasedonsimple URLorevencoolURIreferringtoresourcesthatwerepresentin the archives of these institutions, can no longer be used to check the scientific work or to calculate bibliometric indexes. Another criticalissueisrelatedtotheconnectionofdatasetswhichhavebeen updated several times. In such cases, it may be difficult or even impossibletoverifythevalidityofthescientificoutcomepresented inarelatedpaper. Whatismostcritical,however,istheimpossibil- itytoimplementsystemstochecktheauthenticity,provenanceand integrityoftheseresourcesbecauseoftheabsenceofathirdparty abletoguaranteetheassociationname-resource. Inthisscenario, mostbenefitsofawideaccesstolinkeddatasetaredissolvedbythe lackoftheirreliability. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.381 M.Lunghi,Trustandpersistenceforinternetresources NBN:IT service as a support of trust LOD TotacklethechallengeoftrustinLOD,apossiblesolutioncould betoadoptaURNbasedPIsolutions.6 Presently,toimplementa PIsystem,themainapproachistoseparatetheidentificationfrom thelocalisationoftheresources. Asshownabove,TimBernerLee advisesthatadoptingclearandstablepoliciesandimplementation guidelines is sufficient to manage the persistent identification of resourcesontheinternet. Evenifthissuggestionisreasonableand appropriateinsomedomains,itisevidentthatwecannotdelegate thisresponsibilitytoeachinstitution,inparticularinthescientific andculturalheritagedomainfortwomainreasons: 1. manyinstitutionsfailtodecidetheapproachandthestrategy tobeadoptedintermsofcontentselection,formats,naming, etc.; 2. manyinstitutionsfailtodecidetheapproachandthestrategy tobeadoptedintermsofcontentselection,formats,naming, etc.; Inanycase,UniforumResourceIdentifiers(URIs)arewidelyused inthesemanticwebcontexttoidentifyanytypeofresourcesorany real,digital,abstract,virtualobject,tryingtoharmoniseinaseman- tic vision all the user communities applications. For instance, to addressthisissue,theinfo-URIscheme7wasdevelopedbylibraries andpublishingcommunitiesfor”URIsofinformationassetsthat haveidentifiersinpublicnamespacesbuthavenorepresentation withintheURIallocation”. Itisclearthat,inordertorefertoacerti- fieddigitalobjectinatrustableway,theuseofURNoridentifiers 6APARSEN DE22.1 Persistent Identifiers Interoperability Framework - http: //www.alliancepermanentaccess.org/wp-content/plugins/download-monitor/ download.php?id=D22.1+Persistent+Identifiers+Interoperability+Framework. 7RFC4452:http://info-uri.info. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.382 JLIS.it.Vol.4,n.1(Gennaio/January2013) thatimplementstheRFC1737(FunctionalrequirementsforUniform ResourceNames)istodayabestpractice. ThepurposeofaURN istoprovideagloballyunique,persistent,location-independentre- sourceidentifierwhichcanbeusedfortheidentificationandaccess tothecharacteristicsofaresourceorfortheaccesstotheresource itself. TheURNspecificationispartoftheIETFfamilyofspecifi- cationsencompassedbytheURIframework. Thisframeworkalso includesURLs,whichspecifybothaprotocolandalocationinorder to give access to resources on the web. IANA is the registration authorityforURNnamespaces. URNsaredesignedtoenablehet- erogeneousnamespacesmappingontoaURN-space,andtherefore enable the reuse of well-known identifiers. Unlike URLs, URNs arenotdirectlyactionable(browsersgenerallydonotknowwhat todowithaURN)becausetheyhavenoassociatedglobalinfras- tructurethatenablesresolution(suchastheDNSsupportingURL). Althoughseveralimplementationshavebeenmade,eachproposing itsownmeansforresolutionthroughtheuseofplug-insorproxy servers,aninfrastructurethatenableslargescaleresolutionhasnot beenimplemented. Butsingleimplementationsofnamespace,like theURN-NBNortheDOI,offeraresolution-serviceavailableon internet. TheNBNnamespace,asanamespaceidentifier(NID),has beenregisteredandadoptedbytheNordicMetadataProjectsbutis beingseparatelyimplementedbyindividualsystemswithnorefer- enceimplementationwhichenablethecoordinationofinformation sources. Infact,severalnationallibrarieshavedevelopedtheirown NBNsystemswithinnationalprojects;severalimplementationsare currentlyinuse,eachwithdifferentdescriptivemetadataorgranu- laritylevels. Accordingtothis,itisclearthatthePIs,cannotsupport theLODtrustworthinesssuccessfully. TheNBN-Italyservicesup- portsatleastthreelevelsofpersistence:(Bellinietal.,“TheNational BibliographyNumberItalia(NBN:IT)Project. Apersistentidentifier JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.383 M.Lunghi,Trustandpersistenceforinternetresources supportingnationallegaldepositfordigitalresources”) 1. Persistence of the identifier NBN. If the resource is no longer availableonline,theURNidentifierwillbemaintained(e.g. as proofthatatsomepointthatresourcehasexisted); 2. PersistenceoftheassociationURNsandURLs. Itisacommitment thatensuresthatinthelongtermURNisresolvable(which leadsatleasttoanaddressofURLtype). Theaccessibilityto theresourceisnotguaranteedbutisassuredtheaccesstothe theso-called”Tombstone”iftheresourceisnolongeravailable onthenetwork(e.g. ”Thisebookisnolongeronthemarket”); 3. PersistenceoftheresourcereferencedbyNBN.Ensuringlong-term existence and accessibility the resource referenced by URN. ThisisthelevelofpersistenceofNBNmadepossiblethanks tothestorage(statutoryorvoluntary)atthenationallibraries andauthoritativedescriptionofthenationalbibliography. Thankstotheselevelsofservice,NBN-Italynamesrepresentaclear addedvalueifusedintheLODarchitecturestosupportthetrustwor- thinessoftheassertions(RDFtriple).Thisproposalgoestowardsthe integrationoftheLODandPIsystems,byexploitingtheon-going initiativesandprojectsasoutlinedinthenextparagraph. Next steps: Den Haag Manifesto 2.0 and Florence Agenda Theforthcomingevent”CulturalHeritageOnLine2012”thatwill beheldinFlorenceinDecember2012aimstoimproveandmakeef- fectivethe”DenHagueManifesto”throughtheunionofseveralon- goingrelatedinitiatives,projectsandstakeholderslike: APARSEN JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.384