ebook img

Trust and persistence for Internet resources PDF

0.15 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Trust and persistence for Internet resources

Trust and persistence for internet resources MaurizioLunghi,ChiaraCirinnà EmanueleBellini Introduction Internet radically changed our way of working, communicating, living,producingandaccessinginformation,interactingwithinsti- tutionsandbodies,buyingthingsandmanagingresources. Now everything is available on an open and flexible infrastructure, of- tenfreelyaccessibletoalltheusers: contentsareusablebymany servicestailoredtotheuserrequirements. Thewebhasprobably beenthekillerapplicationfortheinternet. Inthepastfewyears,the webmovedfromawebofdocumentstowardsawebofdatawhere informationisnomorepackagedinfixeddocumentsbutisavailable inade-structuredwayandusableinamoreflexiblewaybyusers. Therecentdevelopmentsonthewebwitnessedtheemergenceof thesemanticwebtechnologiesandthelinkedopendata1approach, associatedwithanincreasinglylargeamountofdataavailablefor publishingandconnectingstructureddataontheweb. Linkeddata bestpractices,supportedbyW3C,2 arenowreadytobeendorsed 1http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/ LinkingOpenData. 2W3C-http://www.w3c.it. JLIS.it.Vol.4,n.1(Gennaio/January2013). DOI:10.4403/jlis.it-5494 M.Lunghi,Trustandpersistenceforinternetresources byarelevantnumberofdataproviders,leadingtothecreationofa globaldataspace-thewebofdata. Unfortunately,theLOD5stars3 aremainlyorientedtowardstheusabilityandstandardizationof datapublishedonthewebwithoutconsideringthetrustabilityand persistenceofthedataandtheURIusedtorefertothem. Infact the objective of the LOD approach seems to be oriented to make ahugenumberofdataonthewebaccessibleinanon-proprietary format(e.g. CSVinsteadofExcel)andtolinkthesedatatoother datasets(e.g. Genomes4orDBpedia5)todisambiguatecontentand toprovideacontext. However,insomecases,andespeciallyinthe cultural and educational domains, besides retrieving the needed dataortheirrelations,itisalsoequallyimportanttogetinformation abouttheirauthenticity,integrityandprovenance. Systemsforcerti- ficationusingPIsfordigitalobjects,forauthorsandforinstitutions can be of great help in order to refine the quality of information retrievablefrominternetandtolargelyincreaseitsusabilityandthe developmentofpotentialnewservices. Thisparadigmbasedonthe identificationandinterconnectionofdataofferssolutionstomany oftheactuallibraryissues,likeenhancedwebsearching,authority control,classification,dataportabilityanddisambiguation. Inthe webofdocumentsidentificationandtrustwereprovidedbyweb sitesandinstitutionssupportingthem,inthewebofdatatheyare integratedinthesinglepieceofdata.Theevolutionofthisparadigm isincreasinglyimportantinavisionforthelongtermcurationof thedigitalresources. 3http://www.w3.org/DesignIssues/LinkedData.html. 4http://www.geonames.org/ontology/documentation.html. 5http://en.wikipedia.org/wiki/DBpedia. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.376 JLIS.it.Vol.4,n.1(Gennaio/January2013) Requirements for the long term curation of digital resources Presentlythenumberofscientificandculturalheritagedigital resourcesmadeavailableontheinternetthroughoutdigitallibrary applications is constantly growing and it is now crucial to guar- anteepersistency,authority,reliabilityandwidedisseminationof resources while supporting their long term curation. One of the mainrequirementstotacklethisissueistoadoptcredibleandPI systems within the life cycle of these resources. A PI should be assignedonlytoresourcesthatarestable,significantfortherelated user community and suitable with the scope of the identification system. Anumberofinitiatives,standards,technologiesareavail- able,butitmaybedifficultforaninstitutiontounderstandwhich ofthesearemoreappropriatefortheirdigitalobjects. ThePItech- nologieshelpmakestablethereferencetoadigitalresource,even ifitiswell-knownthatpersistencyisn’tonlyatechnicalissue. In factthesetechnologiesarenotobviouslyreliableperse,notechnol- ogycanexistindefinitelyorguaranteeserviceswithoutatrustable organizationandclearlydefinedpolicies. InourvisionPIsystems aremeantastheavailabletechnologyplusatrustableorganization and precise policies for digital preservation, implemented by the managers of the related user community. The concept of persis- tence moves from the commitment of an institution/registration authoritytoacommitmentoftheentireusercommunityservedby PI.APIsystemcanbeconsideredasacontractbetweenthefinal usersandtheservice-providersresponsiblefortheimplementation andmaintenanceofthePI-serviceandthefunctionalityofthesys- tem. Fromthispointofview,thepersistenceofaPIdependsalso onthecommitmentofthecommunitythatpromotesandusesthe identificationsystemfortheirownresources. Thishappenswhen JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.377 M.Lunghi,Trustandpersistenceforinternetresources thestandardadoptediseffectivelyorientedtothecommunityre- quirements and the authority in charge to manage the system is recognizedbythecommunityitself. Itiswellknownthatthestruc- turalinstabilityofsimpleURLs(e.g. domainsnolongeravailable) and related resources (relocation or updating) is one of the main issuesthatpreventstheuseofinternetasatrustworthyplatformfor theresearchandthedisseminationofdigitalcontents. Thecurrent use of the simple URL approach used as persistent digital object identifierbringsmanyanddocumentedrisksinalongtermvision notonlyforretrievalandaccessofresourcesbutalsowithrespectto thelossofreferencetothedigitaldocumentsorthelackofguarantee ofauthorityandprovenance. Theserisksaffect: a)theculturalheritageandresearchdomains,preventingtheim- plementationofreliablecitabilityservices,researchevaluation, digitalpreservation,access,etc., b)thebusinessdomain,preventingtheuseofpurchaseservices providedontheseobjects, c)thepublicdomain(e-gov),slowingdownthedematerialization processofpublicadministrations. ItisclearthattheproblemisnotonlytofacetheHTTP404error, but it is moving towards identification systems able to support authority, reliability, preservation, certification, exploitation and widedisseminationoftheseresources. Atrustworthysolutionisto associateatrustedPIwiththedigitalresources. The challenge of trust Trust,broadlyspeaking,concernstheassessmentandmanagement oftherisksperceivedbyeachactorenteringintoarelationship. In JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.378 JLIS.it.Vol.4,n.1(Gennaio/January2013) otherwords,”trustentailsrisk”.AccordingtotheISOdefinition,the riskcanbedefinedasthecombinationoftheprobabilityofanevent anditsconsequences(ISO/IECGuide73). Thereareanumberof eventswithbadconsequencesthatcouldoccurduringthelifetime ofthePIservice,withdifferentdegreesofprobabilitybutallwith highcostsincaseoffailure. Examplesoftheserisksare: a)failingtodeterminetheinitialandrecurringcostsandthepricing ofservice(risksassociatedtothefinancialsustainability), b)adoptingtechnologiesnolongeravailable(riskassociatedtothe standardsadoption), c)theobjectidentifiedisnolongeravailableonthenetwork(riskas- sociatedtotheagreementbetweencontentandserviceproviders), d) to lose the support of the community (risk associated to the communitymandate),etc. Thesefactorscandeterminatethedecrease(lowering)oftrustwor- thinessinthePIservicebythecontentproviderandaffectthedis- seminationandexploitationofdigitalresources. Thevariousdigital repositories store intangible objects and entities and make them availabletousersthroughtelematicsnetworks: weaccessourbank account as well the hospital or the municipality for official docu- ments,wedownloadtonsoffilesandchatwithavataractors. But whocertifiestheidentityofactorsandguaranteesourprivacy?How can we rely on the authenticity of the documents we download? Andalsohowcanwetrusttheinstituteissuingan‘official’docu- ment? Whatistheriskifwecannotdemonstratethatadocumentis notvalidforourexpectedpurposes? Whicharetherisks? Agood amount of trustworthiness is necessary to live in this virtual and artificialworld. APIservicemustaddressatleastthefollowingcore requirements: JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.379 M.Lunghi,Trustandpersistenceforinternetresources 1. globaluniqueness: thePIisclearlypartofanamedomainand itisuniqueandassociatetoauniqueresource. 2. persistence: itreferstothepermanentlifetimethesignificant properties of an identifier, for example, it is not possible to reassignthePItootherresourcesortodeleteit. 3. resolvability: itreferstothepossibilityofretrievinginforma- tionregardingaresourceortoaccessitdirectlyontheinternet. Currently, there are different technologies and standards for the implementationofPIsystems,butthereisn’tageneralagreement ontheiradoption,oftenbecausesomeofthesesystemswerebornas technicalsolutions,withoutthesupportofthecommunityofusers whoneedspecificlevelsofPIservices. SystemslikethePURLor CoolURIs(Berners-Lee)haveconsiderableadvantagesinsupporting thewebofdataimplementationthankstothetheirimmediatede- referenceabilitythroughtheprotocolHTTP,butontheotherhand, thereareseverallimitationsduetothefactthattheirpersistenceis notguaranteedinprinciplebyanindependentandtrustablethird party. ItiswellknownthattheCoolURIapproachtopersistence is based on the URL design. This approach, even if it is consid- eredabestpracticefortheimplementationofthesemanticwebin generalandlinkeddatainparticular,ismainlybasedontechnical solutions. ThebasicassumptionisthatacorrectdesignoftheURI should reduce the need to change them in order to ensure their stabilityovertime. Anexampleofthisbestpracticeistoavoidthe explicitextensionofwebpagesas.phpor.aspsothatchangesin technologyimplementationdonotaffecttheURIform(e.g. from PHPtoASP).Inthisperspective,thepersistenceisbaseduniquely onthecommitmentofindividualinstitutionsestablishingatrusted relationshipdirectlywiththefinalusers,withoutthemediationof athirdparty. Unfortunately,itiswellknownthatthecommitment JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.380 JLIS.it.Vol.4,n.1(Gennaio/January2013) ofasingleinstitutionisnolongersufficienttoensureneitherlong termpersistenceofURLsnorthetrustworthinessoftheresources in terms of provenance, authenticity, integrity, conservation, and soon. Inpracticeresourcesaremovingonthenetwork,theycan be changed or deleted due to a multitude of factors that cannot alwaysbepredeterminedorregulatedbythecontentmanagement policiesofinstitutionsorgovernedbybestpracticetechniques. A typicalcaseoccurswhenaninstitutionrunsoutbecauseithasbeen absorbedbyanotherinstitution, oritissuppressed, orsimplyits officialnamehaschanged. Inthesecases,thedigitalobjectscanbe renamedtobeadaptedtotheworkflowofthenewinstitution,or transferredtootherinstitutions, oratworstdeletedbecausethey arenolongerrelevanttoinstitutionalgoals. Itisclearthatallthese actions can cause the breaking of the old URLs independently of howtheywerebuilt. Thismaynotbeaproblemiftheinstitution doesnothandlescientific,culturaloradministrativeresourcesbut it becomes a critical issue if these changes affect institutions like scientificdatastore,libraries,archives,governmentaldataset,andso forth. Inthesecases,forexample,bibliographiesbasedonsimple URLorevencoolURIreferringtoresourcesthatwerepresentin the archives of these institutions, can no longer be used to check the scientific work or to calculate bibliometric indexes. Another criticalissueisrelatedtotheconnectionofdatasetswhichhavebeen updated several times. In such cases, it may be difficult or even impossibletoverifythevalidityofthescientificoutcomepresented inarelatedpaper. Whatismostcritical,however,istheimpossibil- itytoimplementsystemstochecktheauthenticity,provenanceand integrityoftheseresourcesbecauseoftheabsenceofathirdparty abletoguaranteetheassociationname-resource. Inthisscenario, mostbenefitsofawideaccesstolinkeddatasetaredissolvedbythe lackoftheirreliability. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.381 M.Lunghi,Trustandpersistenceforinternetresources NBN:IT service as a support of trust LOD TotacklethechallengeoftrustinLOD,apossiblesolutioncould betoadoptaURNbasedPIsolutions.6 Presently,toimplementa PIsystem,themainapproachistoseparatetheidentificationfrom thelocalisationoftheresources. Asshownabove,TimBernerLee advisesthatadoptingclearandstablepoliciesandimplementation guidelines is sufficient to manage the persistent identification of resourcesontheinternet. Evenifthissuggestionisreasonableand appropriateinsomedomains,itisevidentthatwecannotdelegate thisresponsibilitytoeachinstitution,inparticularinthescientific andculturalheritagedomainfortwomainreasons: 1. manyinstitutionsfailtodecidetheapproachandthestrategy tobeadoptedintermsofcontentselection,formats,naming, etc.; 2. manyinstitutionsfailtodecidetheapproachandthestrategy tobeadoptedintermsofcontentselection,formats,naming, etc.; Inanycase,UniforumResourceIdentifiers(URIs)arewidelyused inthesemanticwebcontexttoidentifyanytypeofresourcesorany real,digital,abstract,virtualobject,tryingtoharmoniseinaseman- tic vision all the user communities applications. For instance, to addressthisissue,theinfo-URIscheme7wasdevelopedbylibraries andpublishingcommunitiesfor”URIsofinformationassetsthat haveidentifiersinpublicnamespacesbuthavenorepresentation withintheURIallocation”. Itisclearthat,inordertorefertoacerti- fieddigitalobjectinatrustableway,theuseofURNoridentifiers 6APARSEN DE22.1 Persistent Identifiers Interoperability Framework - http: //www.alliancepermanentaccess.org/wp-content/plugins/download-monitor/ download.php?id=D22.1+Persistent+Identifiers+Interoperability+Framework. 7RFC4452:http://info-uri.info. JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.382 JLIS.it.Vol.4,n.1(Gennaio/January2013) thatimplementstheRFC1737(FunctionalrequirementsforUniform ResourceNames)istodayabestpractice. ThepurposeofaURN istoprovideagloballyunique,persistent,location-independentre- sourceidentifierwhichcanbeusedfortheidentificationandaccess tothecharacteristicsofaresourceorfortheaccesstotheresource itself. TheURNspecificationispartoftheIETFfamilyofspecifi- cationsencompassedbytheURIframework. Thisframeworkalso includesURLs,whichspecifybothaprotocolandalocationinorder to give access to resources on the web. IANA is the registration authorityforURNnamespaces. URNsaredesignedtoenablehet- erogeneousnamespacesmappingontoaURN-space,andtherefore enable the reuse of well-known identifiers. Unlike URLs, URNs arenotdirectlyactionable(browsersgenerallydonotknowwhat todowithaURN)becausetheyhavenoassociatedglobalinfras- tructurethatenablesresolution(suchastheDNSsupportingURL). Althoughseveralimplementationshavebeenmade,eachproposing itsownmeansforresolutionthroughtheuseofplug-insorproxy servers,aninfrastructurethatenableslargescaleresolutionhasnot beenimplemented. Butsingleimplementationsofnamespace,like theURN-NBNortheDOI,offeraresolution-serviceavailableon internet. TheNBNnamespace,asanamespaceidentifier(NID),has beenregisteredandadoptedbytheNordicMetadataProjectsbutis beingseparatelyimplementedbyindividualsystemswithnorefer- enceimplementationwhichenablethecoordinationofinformation sources. Infact,severalnationallibrarieshavedevelopedtheirown NBNsystemswithinnationalprojects;severalimplementationsare currentlyinuse,eachwithdifferentdescriptivemetadataorgranu- laritylevels. Accordingtothis,itisclearthatthePIs,cannotsupport theLODtrustworthinesssuccessfully. TheNBN-Italyservicesup- portsatleastthreelevelsofpersistence:(Bellinietal.,“TheNational BibliographyNumberItalia(NBN:IT)Project. Apersistentidentifier JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.383 M.Lunghi,Trustandpersistenceforinternetresources supportingnationallegaldepositfordigitalresources”) 1. Persistence of the identifier NBN. If the resource is no longer availableonline,theURNidentifierwillbemaintained(e.g. as proofthatatsomepointthatresourcehasexisted); 2. PersistenceoftheassociationURNsandURLs. Itisacommitment thatensuresthatinthelongtermURNisresolvable(which leadsatleasttoanaddressofURLtype). Theaccessibilityto theresourceisnotguaranteedbutisassuredtheaccesstothe theso-called”Tombstone”iftheresourceisnolongeravailable onthenetwork(e.g. ”Thisebookisnolongeronthemarket”); 3. PersistenceoftheresourcereferencedbyNBN.Ensuringlong-term existence and accessibility the resource referenced by URN. ThisisthelevelofpersistenceofNBNmadepossiblethanks tothestorage(statutoryorvoluntary)atthenationallibraries andauthoritativedescriptionofthenationalbibliography. Thankstotheselevelsofservice,NBN-Italynamesrepresentaclear addedvalueifusedintheLODarchitecturestosupportthetrustwor- thinessoftheassertions(RDFtriple).Thisproposalgoestowardsthe integrationoftheLODandPIsystems,byexploitingtheon-going initiativesandprojectsasoutlinedinthenextparagraph. Next steps: Den Haag Manifesto 2.0 and Florence Agenda Theforthcomingevent”CulturalHeritageOnLine2012”thatwill beheldinFlorenceinDecember2012aimstoimproveandmakeef- fectivethe”DenHagueManifesto”throughtheunionofseveralon- goingrelatedinitiatives,projectsandstakeholderslike: APARSEN JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5494 p.384

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.