ebook img

The state of the art and practice in digital preservation PDF

14 Pages·0.45 MB·English
by  LeeK.H.
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The state of the art and practice in digital preservation

Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology [J.Res.Natl.Inst.Stand.Technol.107,93–106(2002)] The State of the Art and Practice in Digital Preservation Volume 107 Number 1 January–February 2002 Kyong-Ho Lee, Oliver Slattery, Thegoalofdigitalpreservationisto Keywords: digitalpreservation;emulation; Richang Lu, Xiao Tang, and ensurelong-termaccesstodigitally encapsulation;migration;standardization; Victor McCrary storedinformation.Inthispaper,we XML. presentasurveyoftechniquesusedin National Institute of Standards and digitalpreservation.Wealsointroduce representativedigitalpreservationprojects Technology, andcasestudiesthatprovideinsightinto Gaithersburg, MD 20899-8951 theadvantagesanddisadvantagesof differentpreservationstrategies.Finally,the Accepted: November29,2001 prosandconsofcurrentstrategies, [email protected] criticalissuesfordigitalpreservation,and [email protected] futuredirectionsarediscussed. [email protected] [email protected] [email protected] Availableonline:http://www.nist.gov/jres 1. Introduction Informationpreservationisoneofthemostimportant Theevolutionofdatastoragemediaandthedevelop- issuesinhumanhistory,culture,andeconomics,aswell mentofthepreservationtechnologycanbedescribedas as the development of our civilization. While earliest shown in Fig. 1. This diagram lists the various media informationwasrecordedincarvingsonstone,ceramic, used in data storage (both digital and analog) and the bamboo,orwood,thedevelopmentofcivilizationpaved techniques needed to ensure that the data on them is the way for new storage media and techniques for preserved. It also highlights the trend from analog to recordinginformation,suchaswritingonsilkorprint- digital/optical storage media and indicates the transfer ing on paper. Eventually we were able to put photo- of data from one generation of media to the next. It graphicimagesonfilmandmusiconrecords.Arevolu- is clear that while it is easier to create, amend, and tionarychangeoccurredintheinformationstoragefield distribute digital data, the media storing this data such with the invention of electronic storage media. as optical discs are not as robust as traditional analog Withtheadventofhigh-performancecomputingand media such as paper or film. In view of modern infor- high-speed networks, the use of digital technologies is mation preservation requirements, this paper will focus increasingrapidly.Digitaltechnologiesenableinforma- ontheaspectsofthetechnicalstrategiesusedindigital tion to be created, manipulated, disseminated, located, information preservation. and stored with increasing ease. Ensuring long-term Digital preservation involves the retention of both accesstothedigitallystoredinformationposesasignif- the information object and its meaning. It is therefore icant challenge, and is increasingly recognized as an necessarythatpreservationtechniquesbeabletounder- important part of digital data management [1,2]. stand and re-create the original form or function of the 93 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology Fig.1. Aprogressionofinformationpreservationstrategies. object to ensure its authenticity and accessibility. 2. Digital Preservation Techniques Preservation of digital information is complex because of the dependency digital information has on its This section introduces the current strategies for technical environment. Furthermore, as newer digital digitalpreservation.Techniquesforthepreservationof technologies rapidly appear and older ones are discon- digital information include technology preservation, tinued,informationthatreliesonobsoletetechnologies technology emulation, information migration, and soonbecomesinaccessible.Therefore,digitalresources encapsulation. present more difficult problems than conventional Digital resources can be stored on any medium analog media such as paper-based books [3]. that can represent their binary digits or bits, such as a Recently,severalapproachesfordigitalpreservation CD-ROM or a DVD. Rothenberg [10] defines a bit have been identified and presented. Conventional streamasanintendedmeaningfulsequenceofbitswith methodsaremainlytechnologyemulation,information no intervening spaces, punctuation, or formatting. To migration, and encapsulation [4,5,6,7,8]. However, preserve that bit stream, the first requirement is to thereisalackofprovenpreservationmethodstoensure ensurethatthebitstreamisstoredonastablemedium. that the information will continue to be readable. If the digital medium deteriorates or becomes obsolete To promote a solid understanding of the pros and before the digital information has been copied onto cons of different preservation techniques, this paper another medium, the data will be lost. triestopresentacomprehensivesurveyofthemthrough Therefore, digital preservation involves copying the areviewofawiderangeofliteratureandrepresentative digital information onto newer media before the old projects. We suggest a preservation strategy based on media becomes so obsolete that the data cannot be XML(eXtensibleMarkupLanguage)[9]forinteroper- accessed. This is referred to as copying or refreshing abilityandinterchangeabilityofpreserveddigitalinfor- [6,8].Thisprocesspreservestheintegrityofthedigital mation. The most appropriate preservation strategy information.Well-establishedtechniquesforpreserving should be determined by considering various aspects the integrity of the digital information exist at this includingcost-effectiveness,legalrestrictions,anduser level [5]. access requirements. However, as stated earlier, this For timely refreshing, the lifetime of digital storage paper particularly focuses on the technical issues of media must be predicted. The lifetime of a medium preservation. determinestheperiodoftimeinwhichtheinformation Thispaperisorganizedasfollows.Section2summa- recorded on the medium is stored safely without rizes various techniques for digital preservation. Their loss. The specification of the lifetime of the medium advantages and disadvantages are presented through a will prompt librarians or archivists to refresh their literature survey. By introducing current projects and media before medium deterioration. Recently, case studies, the application of these preservation Zwaneveld [11] suggested that media could be rated techniques is described in detail in Sec. 3. Finally, into five classes according to certain criteria including pointsofdiscussionandasummaryaregiveninSec.4. the lifetime. 94 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology However, simply copying the digital information is the behavior as well as the look and feel of the digital notalwayssufficientasapreservationstrategy[5].One object. For some digital objects, this may be the best hastoensurethattheinformationcanberetrievedand solutionatleastintheshort-termbecauseitensuresthat processedinthefuture.Retrievingabitstreamrequires thematerialisaccessiblebypreservingtheaccesstools a hardware device such as a disk drive for reading the as well as the object itself. physicalrepresentationofthebitsfromthemedium,as However, various issues including space, mainte- well as driver software. nance, and costs may make this impossible in the Existing preservation strategies can be broadly longer term. Specifically, equipment ages and breaks, classifiedintotwomainapproachesasshowninFig.2. documentationdisappears,vendorsupportvanishes,and The first is the more conservative approach where the the storage medium as well as the equipment deterio- original technological environment is fully preserved rates[14].Thisstrategyalsolimitstheportabilityofthe for decoding the digital information in the future. This resource since it will depend on hardware stored in approach can be further divided into two preservation specific locations [3]. techniques.Thefirstistopreservetheworkingreplicas of all computer hardware and software platforms 2.2 Technology Emulation for future use. This is referred to as the technology preservation strategy [5]. The other is to program the Thisstrategyhasalotincommonwiththetechnology newer computer systems to emulate on demand the preservationstrategy.Itinvolvespreservingtheoriginal olderobsoleteplatformsandoperatingsystems.Thisis application program. Emulator programs can be the so-called technology emulation strategy [10]. designed and run on future computer platforms. The emulator is programmed to mimic the behavior of old hardwareplatformsandoperatingsystemsoftware,such as for games and executable files [6,16]. However, this strategy does not involve preserving ageing hardware and original operating system software. Thegoalofemulationistopreservethelookandfeel of the digital object as well as its functionality. The essenceofthisstrategyistocopythetechnicalcontext of the resource allowing the original object or a refreshed copy of the original object to be used in the future.Rothenbergillustratestheconceptualviewofthe relationships among elements of the emulation process Fig.2. Existingpreservationapproaches. as shown in Fig. 3 [15]. This approach decouples The second approach is to overcome the technical application programs from the platform via a virtual obsolescence of file formats. It may also be classified machine with the result that applications can run on intotwotechniques.Thefirstistotransformorconvert differentplatformsbymigratingthevirtualmachineto the old digital resource to a format that is independent those platforms. The Java virtual machine might be an of the particular hardware and software that were example. applied to create them. This is called the information migration strategy [5]. The second technique, termed encapsulation, is where a digital object and anything else necessary to provide access to that object are grouped together and preserved [12]. 2.1 Technology Preservation This solution supposes that complete museums of obsolete equipment could be maintained in order to replicate any old configuration of hardware and software [13]. This strategy involves preserving an original application program, operating system soft- ware, and hardware platform [6]. The advocates of this strategy emphasize that the originalenvironmentneedstoberuntoreallypreserve Fig.3. Rothenberg’semulation-basedpreservation[15]. 95 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology Hendley sees emulation as a short to medium term The emulation approach requires detailed specifica- strategy or a specialist strategy where the need to tions for the outdated hardware and operating system maintain the physical presence of the original digital software [10]. Therefore, standard formats for the resource is of great importance to the users. Again, he emulation specification need to be developed. The seestechnologyemulationbeingprimarilyusedincases extent to which emulation should mimic the original where digital resources cannot be converted into technical environment entirely or emulates only those software independent formats and migrated forward. components necessary to access the data remains an Waugh et al. [12] indicate that the application issue for debate. softwareitselfcouldcontainvirusesthatwouldresultin thelossofinformationovertime.Particularly,theynote 2.3 Information Migration thatemulationrequirespreservingasignificantamount of information. They consider it useful if the goal is to The migration of digital information refers to the preserve the software as an artifact itself and if future periodictransferofdigitalmaterialsfromonehardware user organizations lack sufficient knowledge to under- and software configuration to another, or from one standtheformatofthedigitalinformation.Granger[16] generation of computer technology to a subsequent discussesitspotentialadvantagessuchasthere-creation generation[5].Thepurposeofmigrationistopreserve of the look and feel of the resource. They identify the theintegrityofdigitalobjectsandtoretaintheabilityof possible disadvantages as the undefined nature of users to retrieve, display, and use them in the face of technological change and the complexity of creating constantly changing technology. emulator specifications. His conclusion is that The Open Archiving Information System (OAIS) emulationisnotacompletedigitalpreservationsolution model [22], developed by the Consultative Committee but a partial one. for Space Data Systems (CCSDS)2, breaks migration On the contrary, Russell [3] states that emulation intofourcategories:refreshment,replication,repackag- potentially offers the best solution for very long-term ing, and transformation. Refreshment ensures that a preservationofdigitalmaterial,especiallyforresources reliable copy of the bit stream of a digital object is forwhichthevalueisunknownandwherefutureuseof maintained while replication and repackaging ensure the material is unlikely. Woodyard [1] points out that thatamanageablepackageoftheobjectisavailable.On emulation could be a more suitable solution than theotherhand,transformationactuallymodifiesthebit migration for long-term access to complex digital streamofadigitalobjectandthisiswhatisconsidered resources such as executable files. Gilheany [18] as the process of migration in this paper. discusses the need for emulators to permanently Rather than focus on the technology, information preserve the functionality of computers. Rothenberg migrationtendstofocusontheintellectualcontentand advocates and recommends emulation as the best on ensuring its accessibility using current technology possible solution [10,18] and presents results from the [3]. This strategy could be facilitated through copying first phase of an emulation experiment [15] in the thedigitalinformationtoananalogmediumorthrough Networked European Deposit Library (NEDLIB)1 application programs that are backward compatible. project [19]. It could also be facilitated through the conversion of Holdsworth and Wheatley [20] consider the method digital resources into a small number of standard ofRothenberg,inwhichtheproductionofanemulator formatsthatarehardwareandsoftwareindependent[6]. can be postponed by instead producing an emulator The first solution involves the transfer of digital specificationatthetimeofplatformobsolescence,asa resourcesfromlessstabletomorestableanalogmedia very risky approach to preserving information forever. such as paper or microfilm [23]. While analog media Thisisbecausethereisnoguaranteethataspecification have better proven long-term reliability, they may not produced can be used to produce an emulator in the provideadequaterepresentationsoftheoriginalobject. future.Theypresentsomeillustrativeguidelinesforthe Thissolutionmayalsoleadtoseverelossinfunctional- use of emulation and argue that emulation is a valid ity and presentation of the original digital object. It is methodofdigitalpreservation,atleastforsystemswhere not possible, for example, to microfilm the equations the documentation is not available in electronic form. embedded in a spreadsheet, to print out an interactive Holdsworth [21] also suggests the use of widely video, or to preserve a multimedia document as a flat available programming languages such as C in the file [5,18,24]. implementation of emulation. 1http://www.konbib.nl/nedlib/ 2http://www.ccsds.org/ 96 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology The second solution, software that is backward longerwork.Therefore,thepreservationprobleminthis compatible,willsimplifymigration.Thelatestversions case is obviously focused on the maintenance of the ofmostpopularwordprocessingpackages,forexample, migration tool. Preserving the original bit stream and a willbeabletodecodefilescreatedonearlierversionsof migration tool is compatible with emulation strategy. the same package. While this strategy may work over The task force report and Hendley consider this the short term for simple digital resources created on migration strategy the most promising for the future. some of the leading application packages, it cannot be Bearman[26]alsobelievesthatitisthemostpromising relied upon over the medium to long-term or for more strategy for preservation of electronic records and the complex digital resources. Meanwhile, interoperability only one that has worked to date. Russell [3] points of systems will also facilitate migration because a out that the migration strategy is the most practical specific program is not needed to access a digital approach at least for the short and medium term. resource. However, these features become harder to However, for resources where it is difficult to dis- achieve with greater software complexity. Hence, any entangle format from content such as complex multi- information migration through backward compatibility media documents, this is not an easy option. Multiple or interoperability between application software components may require separate migration activities packages would represent a short-term strategy [6]. and this can be very complex. Indeed, for some multi- Another approach entails initially migrating digital mediaresources,migrationmaynotbepossiblewithout information from the great multiplicity of formats to a significant compromises in functionality. Russell states smaller more manageable number of standard formats. thatthecostsofmigrationmay,inthelongrun,exceed These are less volatile than the wide array of non- those costs necessary for preserving either the technol- standard formats and can still encode the complexity ogyitselforthedetailedtechnicalspecificationthatwill of structure and form of the original resource [5]. allow future emulation. Decisionsonwhichformatstoconvertdigitalresources Waugh et al. believe that the key to successful into should be based on the structure of the digital migration is the knowledge of the original data format resources, on the objectives set by the collection andaclosematchinfunctionalitybetweentheoriginal manager, and on the requirements of the users of that and replacement format. They consider migration the collection [6]. For example, one important issue is simplest over the short and medium term for digital whetherthetoppriorityisgiventopreservingtheability informationthatisactivelymanaged.Ontheotherhand, to process the digital resource or to preserving the Rothenberg dismisses the strategy of migrating format or visual presentation of the digital resource. electronic records systematically before they become A report [5] from the task force on the Commission inaccessible [18]. on Preservation and Access (CPA) and the Research Lawrence et al. [27] have attempted to quantify the LibrariesGroup(RLG)makesthepointthattherearea risksinvolvedintheuseofmigrationandhaveanalyzed variety of migration strategies. Particularly, Wheatley several commercially available migration tools for their [25] attempted to break migration into more specific relative accuracy. This practical investigation has led to casesintheCreativeArchivingatMichiganandLeeds: the identification of some key requirements for migra- EmulatingtheOldandtheNew(CAMiLEON)3project: tion software: access to the source file specification, minimummigration,preservationmigration,recreation, analysis of the differences between it and the target human conversion migration, and automatic conversion format,identificationofthedegreeofriskinthecaseof migration. He applied each case to test materials and a mismatch, accurate conversion of the source file to a discussed its usefulness for digital preservation. target specification, and so on. Recently, the concept of migration on request However,noneofcurrentmigrationmethodsortools has been conceived by the Consortium of University comes close to meeting all of these requirements. It is Research Libraries Exemplars for Digital Archives importanttorealizethattechnicalstandardsmaychange (CEDARS)4 project [3], in which original objects are rapidlyandthatthisstrategymaynotensurethatdigital maintainedandpreservedinadditiontoamigrationtool information remains accessible. Recently, however, that runs on a current computing platform and can be XML has become widely accepted as a universal employed by users as necessary. When the current standard format in various fields of digital library, platform becomes obsolete, the migration tool will no electronic commerce, and the Web. Standardization basedonXMLcouldbehelpfulinaddressingthedigital 3http://www.si.umich.edu/CAMILEON/ preservation problem. 4http://www.leeds.ac.uk/cedars/ 97 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology 2.4 Encapsulation approach. However, Bearman [26] disputes the theory of Rothenberg stating that it is not clear as to how Encapsulation aims to overcome the problems of the metadata encapsulation strategies may be practically technological obsolescence of file formats by making implemented. the details of how to interpret the digital object part of Waughetal.arguethatencapsulationisthebestbasis the encapsulated information. This strategy involves for long-term preservation. They introduce three creatingtheoriginalapplicationthatwasusedtocreate challenges of encapsulation: the requirements for or access the digital object on future computer plat- applications to generate encapsulated records, the forms. Part of the process of encapsulation may be to potential storage overhead of including documentation migratetherecordtoamoreeasilydocumentedformat. about the format within each record, and information TheconceptofencapsulationissimilartotheBento5 aboutunpublisheddataformats.However,evenifformat [28] container, which was developed to increase specifications are publicly available, they are often compatibility of data between computer applications. incompleteandsubstantialcomponentsoffilespecifica- Bento is a specification for storage and interchange of tions often consist of nonstandard elements. compound content, and is designed to be platform and Encapsulation can be considered to be a type of contentneutral.Thus,itprovidesaconvenientcontainer migration technique. Although documentation may fortransportinganytypeofcompoundcontentbetween delaytheneedformigrationforalongtime,theencap- multiple platforms. sulatedinformationwilleventuallyneedtobemigrated. Encapsulation can be achieved by using physical or Therefore, encapsulation techniques can be applied to logical structures called containers or wrappers to the digital resources whose format is well known and provide a relationship between all information compo- that are unlikely to be accessed actively. nents such as the digital object and some supporting information including metadata [29,30]. The reference 2.5 The Digital Tablet modelfortheOAISalsodescribesthetypesofsupport- inginformationthatshouldbeincludedinanencapsula- Kranch [36,37] proposes developing a digital tablet tion. They include the representation information used for preservation. The digital tablet technique does not to interpret the bits appropriately, the provenance to preciselyfitintotheabovecategories.Thetabletwould describethesourceoftheobject,thecontexttodescribe have a self-contained power source, present the stored how the object relates to other information outside the digitalinformationonascreenasglyphsfromawritten container, a reference to one or more identifiers to language appropriate for the information, and have uniquely identify the object, and fixity to provide touch-sensitive controls to change the presentation and evidence that the object has not been altered. manipulate the information. Additionally, it should be TheUniversalPreservationFormat(UPF)[31,32,33] able to withstand millennia of neglect under harsh is a method being developed for digital preservation, conditions yet cost no more than a few dollars to pro- basedonthetheoryofencapsulation.Itisaself-describ- duce and encode. It should have a storage capacity of ingstoragetechnology,whichusesawrappertoholdthe dozens or hundreds of terabytes, and contain a simple digital object and the metadata together to protect serial read-only port to download the original digital againsttechnologicalobsolescence.TheDigitalRosetta information into an external system along with instruc- Stone [34] is a method for storing the representation tionsonhowtodoit.However,thedigitaltabletmaybe information needed to interpret the digital content of considered to be another technology preservation an object separate from the encapsulation to avoid method. duplicationofeffortandinefficientuseofstoragespace. Encapsulation has been widely promoted by Rothenberg who is a strong advocate of encapsulation 3. Projects and Case Studies andemulationmethodsfordigitalpreservation.Day[30] andShepard[35]havealsosupportedtheencapsulation This section introduces representative projects for digital preservation, describes their goals and results, and presents their specific preservation strategies. 5Certaincommercialequipment,instruments,ormaterialsareidenti- Recently,CloonanandSanett[38]presentedareporton fiedinthispapertofosterunderstanding.Suchidentificationdoesnot the initial phase of projects involved in developing, imply recommendation or endorsement by the National Institute of evaluating, and/or implementing digital preservation Standards and Technology, nor does it imply that the materials or equipmentidentifiedarenecessarilythebestavailableforthepurpose. strategies in Europe, Australia, and the United States. 98 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology 3.1 Australian Projects The project is based on the OAIS model described earlier.Essentiallyitisanarchivalmodelforanarchiv- Australia has been examining digital preservation ingsystembutdoesnotexplicitlyincludeapreservation issues since 1994. Several projects are aiming to pre- module. Current projects investigating the use of the serve various digital materials such as electronic modelincludeNEDLIBinEurope,CEDARSintheUK, records, online publications, digital audio resources, and PANDORA in Australia. theses, and cartographic materials [1,12,39]. The The CEDARS project aims to address the strategic, VictorianElectronicRecordsStrategy(VERS)6project methodological,andpracticalissuesofdigitalpreserva- [12]producedastandardformanagementandpreserva- tion as well as providing guidance for libraries in best tion of electronic records [40,41]. The standard pro- practices. Specifically, the project is producing guide- posedbytheVERSprojectrecommendsencapsulating lines for developing digital collection management thedocumentsandtheircontextinasingleobjectbased policies and preserving different classes of digital on XML. resources. Additionally, it is performing an analysis of ThePreservingandAccessingNetworkedDocumen- thecostimplicationsofdigitalpreservation.Theproject tary Resources of Australia (PANDORA)7 project by isrunningpilotprojectstotestandpromotethechosen the National Library of Australia has led the way in strategy for digital preservation. The CEDARS Data archiving the Web [42,43]. The primary objective of Preservation Strategies Working Group is looking at PANDORAistocapture,archive,andprovidelong-term preservation issues that are related with migration, access to significant online publications. The project emulation, and data refreshing. aims at addressing both archiving and preservation Specifically,theprojecthasdonesomeworkcompar- processes.ThePANDORAarchivingprocessesreferto ingmigrationandemulationacrossolderdigitalmateri- the collection and provision of immediate access to the alsinconjunctionwiththeCAMiLEONproject[20,25]. publications while preservation processes involve They suggest that both migration and emulation strate- managingthematerialsandapplyingappropriatestrate- gies are viable for different types of digital materials. gies (e.g., migration) to ensure long-term access. The Theyalsobelievethatnostrategyisapanacea,andthe archiving processes have been developed while strategy adopted for providing access to preserved techniques for managing long-term access to these resources will very much depend on the nature of the digital resources is still being developed. resourceitselfandthereasonforitspreservation.More The project has also embarked on migration experi- work is planned to investigate the information loss ments with some HTML pages. The format of these associated with each strategy. pages is not yet obsolete, but the HTML specification [44]hasdeclaredanumberofmark-uptagsasdeadand not to be supported in this or future versions. The aim 3.3 CAMiLEON of these trials is to make changes in the HTML source codetoremovetagsdeclaredasdeadandreplacethem The CAMiLEON project is funded by the Joint withcurrenttags,effectivelymigratingthesourcecode Information Systems Committee (JISC) in the UK and to a different version to reduce problems of future theNationalScienceFoundation(NSF)intheUSA.The compatibility with Web browsers. Additionally, the projectlooksattheissuessurroundingtheimplementa- National Archives of Australia is undertaking a project tion of technology emulation as a digital preservation todevelopadviceforcommonwealthagenciesonusing strategy.Theprojectrecognizesthepotentialofemula- migration as a preservation strategy for electronic tion for the retention of the functionality and the look records [45]. andfeelofdigitalobjects[20].Itaimstodeveloptools, guidelines,andcostsforemulationascomparedtoother digital preservation options. 3.2 CEDARS The project is performing user testing of various The Consortium of University Research Libraries, digitalresourcesbothintheiroriginalenvironmentsand which represents both university and national libraries in emulated environments. The project presents guide- across the UK and Ireland, leads the CEDARS project. linesforuseofemulationandarguesthatemulationisa valid method for both complex digital resources that 6http://www.prov.vic.gov.au/vers/ include executable files and resources for which the 7http://www.nla.gov.au/policy/plan/pandora.html documentation is not available in electronic form. 99 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology 3.4 NEDLIB 3.6 Library of Congress The NEDLIB project was initiated by a permanent Recently, the Computer Science and Telecommuni- standing committee of the Conference of European cations Board (CSTB) of the National Academies NationalLibraries(CENL)in1998,withfundingfrom [46,47] convened the Committee on the Information the European Commission’s Telematics Application Technology Strategy for the Library of Congress for Programme. Eight national libraries in Europe, one advice on digital preservation. The committee report nationalarchive,andmajorpublishersareparticipating includes specific recommendations for enhancing intheproject.TheNationalLibraryoftheNetherlands technology infrastructure, particularly in the area of leads the project. networks, databases, and information technology The project aims to develop a common architectural security. frameworkandbasictoolsforbuildingdepositsystems TheLibraryofCongress’pilotproject,workingwith forelectronicpublications.Theprojecthasalsoadopted theInternetArchive9,hasworkedthroughallaspectsof the OAIS migration model described earlier. The main archivingtheWebintheareaofpoliticalWebsites.This objective of the NEDLIB project is to provide better projectusestheDigitalLibrarySunSITECollectionand insight into the merit and weakness of different long- PreservationPolicy10fromtheUniversityofCalifornia, term preservation strategies. The project defines the Berkeley, which provides several digital collecting characteristics of electronic publications and other levels, as guidance. categories of digital deposit material and associated preservation and authenticity requirements. It is recog- 3.7 NARA: Persistent Archives and Electronic nized that many aspects including cost-effectiveness, Records Management legal restrictions, agreements with publishers, and user access requirements ultimately need to be taken into The NARA (National Archives and Records Admin- accountwhenpolicychoicesforpreservationstrategies istration)project,11whichisledbytheSanDiegoSuper- are set. However, the project focuses on the technical computerCenterandfundedbyNARA,aimstodevelop issues of preservation. a persistent archive to support ingestion, archival Theprojecthastakenafirststeptotestthetechnical- storage, information discovery, and preservation of ities of the preservation mechanisms by starting an digitalcollections.Oneofitspremisesistheimportance emulation experiment. The fundamental idea of the of preserving the organization of digital collections work is to test whether emulating obsolete computer simultaneouslywiththedigitalobjectsthatcomprisethe hardware on future systems could be used to ensure collection [48]. long-termaccesstodigitalpublications.Rothenberghas Theultimategoalistopreservenotonlytheoriginal performed the first phase of this experimental work data, but also the context that permits the data to be [15]. It involved developing a prototype experimental interpreted. The project proposes an approach for environment for trying out emulation-based preserva- maintaining digital data for hundreds of years through tion and using commercial emulation tools to provide developmentofanenvironmentthatsupportsmigration an initial proof-of-concept. The experimental results ofcollectionsontonewsoftwaresystems.Theproposed indicatethatemulationshouldworkinprinciple,assum- infrastructure combines elements from supercomputer ing that suitable emulators for obsolete computing centers, digital libraries, and distributed computing platforms can be hosted on future platforms. environments.Theprojectemphasizesthesynergythat is achieved through the identification of the unique 3.5 Kulturarw3 Heritage capabilities provided by each environment, and the construction of interoperability mechanisms for The Kulturarw3 Heritage Project8 of the Royal integrating these environments. According to this Library in Sweden is testing methods of collecting, project, collection-based persistent archives are now archiving, and providing access to Swedish electronic feasible and can manage massive amounts of informa- documents.Webcrawlersorrobotsareusedinorderto tion based on XML [49]. collect all the Swedish Web pages automatically. Althoughtheprojectcurrentlydoesnotfocusonpreser- vation,itisgrowingintoabroaderNordicinitiativethat may explore the long-term preservation of this archive. 9http://www.archive.org/ 10http://sunsite.berkeley.edu/ 11http://www.sdsc.edu/NARA/ 8http://kulturarw3.kb.se/html/projectdescription.html 12http://www.interPARES.org/ 100 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology 3.8 InterPARES users to define preservation policies and enforce them automatically. The International Research on Permanent Authentic RecordsElectronicSystems(InterPARES)12project[38] 3.10 Canadian Projects is a multinational research initiative, in which archival scholars, computer engineering scholars, national E-preservation14wasdevelopedthroughacooperative archivalinstitutions,andprivateindustryrepresentatives effort between the National Library of Canada and the are collaborating to develop the theoretical knowledge Canadian Initiative on Digital Libraries (CIDL). and methodology required for the permanent preserva- E-preservationisintended1)toprovideCanadianswith tion of authentic records created using electronic easyaccesstopoliciesand2)toperformresearchonthe systems [50]. creation,use,andpreservationofdigitalcollections.The Theresearchareasincludeidentifyingtheelectronic project includes guidelines about various aspects recordelementsthatneedtobemaintained,developing including acquiring digital materials, formats, and criteria to appraise electronic records for preservation, metadata. and formulating principles for development of inter- national, national, and organizational preservation 3.11 PreservationProjectsattheNationalInstitute strategies. Specifically, the research areas are divided of Standards and Technology (NIST) into four complementary domains: authenticity, appraisal, preservation, and strategies. In terms of TheearliestworkondatapreservationatNISTcanbe authenticity, the purpose is to identify the specific traced back to the 1980s when Podio performed elements of electronic records that must be preserved, researchonthelifetimemeasurementofcompactdiscs overtimeandacrosstechnologies,inordertoverifythe [51].Hisworkprovidedabasisforastandardmethodol- record’sauthenticity.Asafirststep,atemplatetoguide ogyforthelifetimemeasurementofopticaldiscs.With the analysis of electronic records was developed and the increasing usage of digital storage in libraries and is being evaluated. The project is based in the School the archiving of the government agencies, the great of Library, Archival and Information Studies at the importance of digital preservation became clear to University of British Columbia in Canada. The current NIST’s Information Technology Laboratory,15 and phaseoftheprojectwouldendonDecember31,2001. accordingly new projects on the study of digital data preservation have started in the following aspects: • Longevitytesting.Thisprojectinitiallyconsistedof 3.9 PRISM an examination of the effects of heat and humidity The Preservation, Reliability, Interoperability, on the lifetime of optical discs and was later Security, Metadata (PRISM)13 project [29] of the extended to include the effects of light exposure. Cornell University is a 4year project funded by the Thefocusisnotonlyonthelifetimeitself,butalso Digital Library Initiative to investigate and develop on the deterioration process. The results may be policies and mechanisms needed for information useful both for new disc production and for the integrity in digital libraries. The project focuses on classification of existing recorded discs. long-termsurvivabilityofdigitalinformation,reliability of information resources and services, interoperability, • Testingofinterchangeabilityandinteroperabilityof security, and metadata. opticaldiscsforuseinhigh-densitystoragesystems Thecurrentdirectionoftheprojectistowarddevelop- suchasopticaldisc“Jukeboxes.”Combinedwiththe ing techniques for monitoring the integrity of dis- application and further development of XML, a tributedWeb-basedinformationresourcesandenforcing new preservation strategy may be developed. This preservation policies set by the owners and users of programisbeingconductedincollaborationwiththe collections. Monitoring resources will involve both the High Density Storage Association (HDSA).16 An automated capture of information using a specialized open testing laboratory is being developed that will Web crawler and the manual gathering of data on the include interoperability and interchangeability organizationalstatusofparticularresourcesandcollec- testingaswellastestingofthesuitabilityofvarious tions. The ultimate objective is to develop a cost-effec- types of high capacity storage systems for different tive and event-based metadata scheme that will enable applications including preservation. 14http://www.nlc-bnc.ca/cidl/preserv-conserv/index.htm 15http://www.itl.nist.gov/ 13http://www.prism.cornell.edu/ 16http://www.hidensity.org/ 101 Volume107,Number1,January–February2002 Journal of Research of the National Institute of Standards and Technology • Development of the Turbo coding system [52]. 4. Discussion and Summary Unlikethetraditionaldigitalpreservationtechniques thataimtokeeptheinformationreadableinthelong Inthispaper,threemainpreservationtechniqueshave term, this new technique aims to develop a method been discussed in detail. Each of them has advantages for finding and recovering useful information from anddisadvantagesasshowninTable1.Differentstrate- failed discs. The failure of many commercial error giesareviablefordifferenttypesofdigitalmaterials.In correction codes to support retrieval of important cases of complex resources and application softwares information from aging or damaged digital storage suchasgamesandexecutablefiles,emulationisasuit- devices has indirectly resulted in the reduction of able approach. In particular, emulation is the most the rated life expectancy and archival properties feasible choice where there is a lack of sufficient of digital storage devices. Turbo code has shown knowledge regarding the format of the digital informa- promise in retrieving information previously tion and where the look and feel of digital information inaccessible.Turbocodesaretwoparallel,recursive, is important. and systematic convolution codes. These codes On the other hand, migration or encapsulation is are used for the channel coding and decoding in appropriate for digital resources where knowledge of order to detect and correct the errors that may the format is sufficient and where the resource is occur in the transmission of digital data through relatively simple. Specifically, migration is appropriate different channels. The iterative method of the for resources that are actively accessed and managed. decoding scheme helps to achieve the theoretical Encapsulationissuitableforresourcesthatareunlikely limit (near Shannon-limit [53]) in error correction to be actively accessed. Fig. 4 shows a schematic performance. This program is being conducted in diagram to select the suitable preservation techniques cooperation with Carnegie-Mellon University. fordigitalresourcesaccordingtothetypeandcomplex- ity of digital information, the availability of the data format, and its usage. Table1. Advantagesanddisadvantagesofpreservationtechniques Technique Advantages Disadvantages Domain Emulation Maintainsthelook Thecomplexityofcreating Applicationsoftware. andfeel. emulatorspecifications. Complexdigitalresourcessuchas Thelargeamountof thosethatcontainexecutablefiles. informationthatmustbe preserved. Resourcesforwhichthereisalack ofsufficientknowledge. Archaicsoftwarerequired toaccessinformation. Resourcesforwhichthevalueisunknown andforwhichfutureuseisunlikely. Resourceswhoselookandfeelareimportant. Migration Doesnotneedtoretain Significantcostforlong-term Resourcesthatareactivelyaccessedand originalapplications. preservation. managed, such as scientific data or database. Supportsactiveaccess Informationdegradation. Resourceswhoseformatsaresufficiently andmanagement. wellknown. Lackofpreservationmetadata. Needforcontinueddiligence onthepartofarchivists. Encapsulation Maintainspreservation Knowledgeabouttheformat Resourcesthatareunlikelytobeaccessed information. mustbepreserved. andmanagedactively. Systemsrequiredforcapturing Resourceswhoseformatsaresufficiently thedigitalinformation. well-known. 102

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.