digital investigation 6 (2009) S57–S68 available at www.sciencedirect.com journal homepage: www.elsevier.com/locate/diin Extending the advanced forensic format to accommodate multiple data sources, logical evidence, arbitrary information and forensic workflow Michael Cohen*, Simson Garfinkel, Bradley Schatz AustralianFederalPolice,HighTechCrimeOperations,203WharfSt.,SpringHill,Brisbane4001,Australia a b s t r a c t Keywords: Forensic analysis requires the acquisition and management of many different types of Digitalforensics evidence, including individual disk drives, RAID sets, network packets, memory images, Image and extracted files. Often the same evidence is reviewed by several different tools or HarddiskImaging examiners in different locations. We propose a backwards-compatible redesign of the DigitalEvidenceManagement Advanced Forensic Formatdan open, extensible file format for storing and sharing of DistributedStorage evidence, arbitrary case related information and analysis results among different tools. DistributedForensicAnalysis Thenewspecification,termedAFF4,isdesignedtobesimpletoimplement,builtuponthe ForensicFileFormat well supported ZIP file format specification. Furthermore, the AFF4 implementation has EvidenceArchiving downwardcomparabilitywithexistingAFFfiles. Cryptography ª2009DigitalForensicResearchWorkshop.PublishedbyElsevierLtd.Allrightsreserved. ForensicIntegrity 1. Introduction descriptionoftheAFF4proposalisthenfollowedbyconcrete realworldusecases. Storing and managing digital evidence is becoming increas- inglymoredifficult,asthevolumeandsizeofdigitalevidence increases. Evidence sources have also evolved to include 1.1. Priorwork data other than disk images, such as memory images, network images and regular files. Preserving such digital Inrecentyearstherehasbeenasteadyandgrowinginterestin evidence is an important part of most digital investigations the actual file formats and containers used to store digital (Carrier and Spafford, 2004), and managing the evidence in evidence. Early practitioners created exact bit-for-bit copies a distributed organization is now emerging as a critical (commonly referred to as ‘‘dd images’’). More recently, requirement. proprietarysoftwaresystemsformakingandauthenticating Thispaperpresentsaframeworkformanagingandstoring ‘‘images’’ofdigitalevidencehavebecomecommon(e.g.B.S. digital evidence. We first examine existing evidence NTIForensicsSource,2008;Ilookinvestigator,2008;Guidance management file formats and outline their strengths and Software, Inc., 2007). PyFlag (Cohen, 2008a) introduced limitations. We then explain how the proposed Advanced a‘‘seekablegzip’’formatthatalloweddiskimagestobestored ForensicsFormat(AFF4)frameworkextendstheseeffortsinto inaformthatwascompressedbutallowedrandomaccessto a universal evidence management system. The detailed evidencedatanecessaryforforensicanalysis. * Correspondingauthor.Tel.:þ61732221361. E-mailaddress:[email protected](M.Cohen). 1742-2876/$–seefrontmatterª2009DigitalForensicResearchWorkshop.PublishedbyElsevierLtd.Allrightsreserved. doi:10.1016/j.diin.2009.06.010 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2009 2. REPORT TYPE 00-00-2009 to 00-00-2009 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Extending the advanced forensic format to accommodate multiple data 5b. GRANT NUMBER sources, logical evidence, arbitrary information and forensic workflow 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Australian Federal Police,High Tech Crime Operations,203 Wharf REPORT NUMBER St,Spring Hill, Brisbane 4001, Australia, 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT Forensic analysis requires the acquisition and management of many different types of evidence, including individual disk drives, RAID sets, network packets, memory images and extracted files. Often the same evidence is reviewed by several different tools or examiners in different locations. We propose a backwards-compatible redesign of the Advanced Forensic Formatdan open, extensible file format for storing and sharing of evidence, arbitrary case related information and analysis results among different tools. The new specification, termed AFF4, is designed to be simple to implement, built upon the well supported ZIP file format specification. Furthermore, the AFF4 implementation has downward comparability with existing AFF files. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE Same as 12 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 S58 digital investigation 6 (2009) S57–S68 The Expert Witness Forensic (EWF) file format was origi- WecallthenewsystemAFF4,andusethephraseAFF1to nallydevelopedforEncase(GuidanceSoftware,Inc.,2007),but refertothelegacysystemdevelopedbyGarfinkeletal.1The thenadoptedbyothervendors(Kloetetal.,2008).TheEWFfile publiclyreleasedAFF4implementation,isabletoreadexisting format similarly compresses the image into 32kb chunks AFFfiles. whicharestoredbacktobackingroupingsinsidethefile.The formatemploystablesofrelativeindexestothecompressed chunks to improve random access efficiency. EWF volumes 2. The needfor animproved forensic format haveamaximumsizelimitof2Gbandthereforeusuallysplit animageacrossmanyfiles.EWFprovidesforasmallnumber AFF1’sflexibilitycamefromadatamodelofforensicdataand of predefined metadata fields to be stored within the file metadata stored as arbitrary name/value pairs called format. segments.Forexample,thefirst16MBofadiskimageisstored TheAdvancedForensicFormat(AFF)expandedonthisidea in a segment called page0, the second 16MB in a segment withaforensicfileformatthatallowedbothdataandarbitrary called page1, etc. Because of this flexibility, it was relatively metadata to be stored in a single digital archive (Garfinkel easytoextendAFF1tosupportencryption,digitalsignatures, etal.,2006). andthestorageofnewkindsofmetadatasuchaschain-of- Both the AFF and EWF file formats are designed to store custodyinformation(Garfinkel,2009). asingleimage,andanymetadatathatimplicitlyreferstothat imagesuchassectorsizeandacquisition date.UnlikeEWF, 2.1. AFFlimitations AFFemployedasystemtostorearbitraryname/valuepairsfor metadata, using the same system for both user-specified Weobservedanumberofpracticalproblemsintheunderlying metadata and for system metadata, such as sector size and AFF1standardandGarfinkel’sAFFLIBimplementation: deviceserialnumber.Forexample,Aimage,theAFFharddisk acquisition tool, not only stores the image, but additionally (cid:2) While AFF1’s design stores a single disk image in each stores a description of the tool itself, the version of AFFLIB evidencefile,moderndigitalinvestigationstypicallyinvolve usedtocreatetheimage,thecomputeronwhichtheimage manyseizedcomputersorpiecesofmedia. wasmade,theoperatorofthetool,theusersuppliedparam- (cid:2) ThedatamodelofAFF1enabledstoringmetadatarelatedto eterssuppliedtothetool. the contained image as (property, value) pairs. This data SchatzproposedaSealedDigitalEvidenceBagsarchitecture, model does not, however, support expressing arbitrary facilitating composition of evidence and arbitrary evidence informationaboutmorethanoneentity. relatedinformation,throughasimpledatamodelandglobally (cid:2) AFF1hasnoprovisionforstoringmemoryimagesorinter- uniquereferencingscheme(SchatzandClark,2006). ceptednetworkpackets. (cid:2) AFF1 has no provisions for storing extracted files that is 1.2. Thispaper analogous to the EnCase ‘‘Logical Evidence File’’ (L01) format,orforlinkingevidencetowebpages. An important advance of this work is the introduction of (cid:2) AFF1’s encryption system leaks information about the storage transformation functions to the forensic storage contentsofanevidencefilebecausesegmentnamesarenot container.Priorworkssimplyfocusedon forensicallysound encrypted. storage of bit-streams, leaving the necessary activities of (cid:2) AFF1’sdefaultcompressionpagesizeof16MBcanimpose translatinglowlevelstorageintohigherlevelabstractionsat significant overhead when accessing NTFS Master File theaggregateblock(i.e.RAID),volume,andfilesystemlayers Tables (MFT), as these structures tend to be highly frag- in the domain of analysis tools, as transiently constructed mentedonsystemsthathaveseensignificantuse. artifacts. In contrast AFF4 has mechanisms for describing (cid:2) Although the AFF1 specification calls for a ‘‘table of transformationinaflexibleandconciseway,allowingusersto contents’’similartotheZip(Katz,2007)‘‘centraldirectory’’ view multiple transformations of the same data with little that is stored at the end of AFF files, Garfinkel never additional storage cost. This mechanism is an important implemented this directory in the publicly released AFF1 enablerforinter-operableforensictools.Forexample,carved implementation,AFFLIB.Asaresult,everyheaderofevery files may be described in terms of their block allocation segmentinanAFFfileneedstobereadwhenafileisopened. sequencesfromanimage,ratherthanrequiringthecarvedfile Inpracticethiscantakeupto10–30sthefirsttimealarge tobecopiedagain. AFFfileisopened. This paper extends previous work on the Advanced (cid:2) AFF1’s bit-level specification is essentially a simple ForensicFormat(AFF)bytakingmanyoftheconceptsdevel- container file specification. Given that there are other opedanddesigninganewspecificationandtoolset.TheAFF4 container file specifications that are much more widely format is a complete redesign of the architecture. The new supported with both developer and end-user tools, it architectureiscapableofstoringmultipleheterogeneousdata seemed reasonable to migrate AFF from its home-grown types that might arise in a modern digital investigation, formattooneoftheexistingstandards. includingdatafrommultipledatastoragedevices,newdata types (including network packets and memory images), 1AlthoughGarfinkelneverchangedtheAFFbit-levelspecifica- extracted logical evidence,and forensicworkflow. The AFF4 tion, Garfinkel released AFFLIB implementations with major format extends the format to make it the basis of a global versionnumbers1,2and3.WethereforecalloursystemAFF4to distributedevidencemanagementsystem. avoidconfusion. digital investigation 6 (2009) S57–S68 S59 2.2. Globaldistributedevidencemanagement 3. Introducing AFF4 While AFF1 was designed for use on a single machine that ThissectiondiscussestheAFF4terminologyandarchitecture. could both image evidence and perform analysis, many The AFF4 design is object oriented, in that a few generic modern practitioners work in distributed environments in objectsarepresentedwithexternallyaccessiblebehavior.We whichimagingandanalysistakesplaceinmultiplelocations discuss a number of implementations of these high level andisperformedbymultipleindividuals. conceptsandshowhowthesecanbeputtogetherincommon Global distributed evidence management requires more usagecases. thansimplytrackingthemovementofdiskimages:itrequires approaches for sharing evidence to multiple disconnected (cid:2) AnAFFObjectisthebasicbuildingblockofourfileformat.AFF evidence,allowingofflinework,andthenseamlesslyrecom- Objectshaveagloballyuniquename(URN)asdescribedin bining the work products of the analysts in a third security (Sollins and Masinter, 1994; Fielding, 1995, Hoffman et al., domain. 1998).Thenameisdefinedwithintheaff4namespace,andis Managing evidence in a globally distributed system madeuniquebyuseofauniqueidentifiergeneratedasper requires the use of globally unique identifiers to ensure no RFC4122(Leachetal.,2005). namecollisionscanoccurwithdisconnectedlocations.AFF1 (cid:2) A Relation is a factual statement which is used either to assigns each piece of evidence a unique 128-bit identifier describe a relationship between two AFF Objects, or to called a GID but did not make it clear when this identifier describesomepropertyofanobject.Therelationcomprises shouldbechangedandwhenitshouldremainthesame. of a tuple of (Subject, Attribute, Value). All metadata is Consider the typical usage scenario depicted in Fig. 1, of reducedtothistuplenotation. avolumecontainingadiskimage.Thisvolumeisdistributed (cid:2) AnEvidenceVolumeisatypeofAFFObjectwhichisresponsible to two independent analysts, Alice and Bob. Alice may find forprovidingstoragetoAFFsegments.Volumesmustprovide and extract individual files, while Bob may correlate infor- a mechanism for storing and retrieving segments by their mationintheevidencefilewithotherdatathatisavailableon URN. We discuss two volume implementations below, departmentalservers.AlthoughinsomeenvironmentsAlice namelytheZip64basedvolumeandtheDirectorybasedvolume. andBobmaybeabletoworkonasharedfilethatislocatedon (cid:2) AstreamisanAFFObjectwhichprovidestheabilitytoseek a server, in other environments there will not be sufficient andreadrandomdata.Streamobjectsimplementabstracted connectivity. Instead, each analyst will be required to store storage,butmustprovideclientsthestreamlikeinterface. theinformationintheirownevidencefile;thesefileswillthen Forexample,wediscusstheImagestreamusedtostorelarge berecombinedatalaterpointintime. images,theMapstreamusedtocreatetransformationsand In this case they can each create a new volume which theEncryptedstreamusedtoprovideencryption. extends the original volume and save their analysis on this (cid:2) Asegmentisasingleunitofdatawrittentoavolume.AFF4 newvolume.Nowtheyonlyneedtosharethisnewvolume segments have a segment name provided by their URN, withotheranalystswhoalsohaveacopyoftheoldvolumeto a segment timestamp in GMT, and the segment contents. interchangetheirfindings. Segmentsaresuitableforstoringsmallquantitiesofdata, This is made possible because each volume is indepen- andstillpresentastreaminterface. dent of one another, but is still viewed as part of a bigger (cid:2) A Reference is a way of referencing objects by use of evidenceset. aUniformResourceIdentifier(URI).TheURIcanbeanother AFFObjectURNormaybeamoregeneralUniformResource Locator (URL), such as for example a HTTP or FTP object. This innovation allows objects in one volume to refer to objects in different volumes, facilitating data fusion and crossreferencing. (cid:2) TheResolverisacentraldatastorewhichcollectsandresolves attributes for the different AFF Objects. The Resolver has universalvisibilityofobjectsfromallvolumes,andtherefore guidesimplementationsinresolvingexternalreferences. 4. Metadata and the universal resolver Fig.1–Atypicalusagescenario.BothAliceandBobreceive Managementofevidencerequiresaneffectiveidentification, anAFFvolumebutworkindependently.Ratherthan withpractitionerscurrentlyemployingacquisitiontimemet- modifyingthevolume,theyeachcreatetheirownlocal adatasuchascaseidentifiersanddescriptionfieldsintheEWF volumesandsavetheirresultsintothosefiles.Theycan fileformat;fileanddirectorynamingschemes,andlabelingof nowexchangethesmallernewvolumesandeffectively evidencecontainer hard drives. Evidencemay also be refer- mergetheirresultsintothesameAFFsetwhentheyare encedbyexternalmeansinaninconsistentway.Forexample, finished. inaninvestigator’scasenoteadiskimagemaybereferredto S60 digital investigation 6 (2009) S57–S68 by the name of the suspect (e.g. Joe’s hard disk), the case statements may exist without being stored in a volume (for numberordates. example, being stored on an external SQL server). Alterna- Such individuation schemes may be problematic when tively, these statements may be stored in some other way automaticallymanagingevidence.Forexample,atacquisition insideoroutsidethevolume(e.g.SQLitedatabasefiles). time a suitably unique individuator may not be selected. If When the volume is loaded, the AFF4 implementation thatoccurred,at analysis timeevidencecontainer filesmay automatically loads any properties files and populates its needtoberenamedtoavoidnamecollisions. Universal Resolver with the information visible to it. AFF4 The AFF4 design adopts a scheme of globally unique providesamechanismtouseanexternalresolveraswelldfor identifiers for identifying and referring to all evidence. We example, we have implemented a resolverthat stores Attri- defineanAFF4specificURNscheme,whichwecalltheAFF4 butes in a MySQL database to provide for a persistent URN. URN’s of this scheme use the namespace (Sollins and UniversalResolverthatsharesinformationbetweendifferent Masinter, 1994) ‘‘aff4’’ and therefore begin with the string instancesonthesamenetwork. ‘‘urn:aff4’’. AFF4 URNs are then be made unique by use of Although the Universal Resolver should be thought of as a unique identifier generated as per RFC4122 (Leach et al., a truly universal entity, the library provides a local resolver 2005).Forexample,anAFF4URNmightbeurn:aff4:bcc02ea5- which is available to the running instance. As the library eeb3-40ce-90cf-7315daf2505e. explores different volumes, relations are added to the local The AFF4 model treats metadata as an abstract concept resolver. This means that the AFF4 library does not neces- whichmayexistindependentlyfromthedataitself.Weterm sarily need to have an ideal Universal Resolver, but can metadata to be a set of statements about objects, written in approximate this by use of a local resolver. The local tuplenotations(Subject,Attribute,Value),whereSubjectisthe resolver can be primed in advance by the user, by loading URNoftheobjectthestatementismadeabout.AnAttributecan various volumes which may be needed to resolve internal beanykindofvalueorrelationship,suchasthesectorsizeof references. a device, a device capacity, or the name of the person who Each URN within the AFF4 universe must have an performedanimagingoperation.AValueisthevalueofthe ‘‘aff4:type’’ attribute to denote the type of the Object. attribute,whichiseitheranotherURN,orsometextualvalue. Objectsmayalsohaveathe‘‘aff4:interface’’attributeto Usingthissystemweareabletostorearbitraryattributesabout denote what kind of interface they present (e.g. stream or anyobjectintheAFF4universe.Additionally,asthesestate- volume). mentsareuniversallyscoped,theymaybestoredanywhere. The AFF4 design extends beyond the management of a single volume, stream or image to a universal system for 5. Volumes managing data of many types. This necessarily means that asinglerunninginstanceisgenerallyunabletohavevisibility The volume object is responsible for providing storage for oftheentireAFF4universe.Forexample,ifavolumeisopened segments. Segments are stored and retrieved using their which contains a Map Stream targeting a stream stored in URNs.Wedescribetwodifferentimplementationsofvolume a different volume, it is not generally possible to tell where objects,namelytheDirectoryVolumeandtheZipFileVolume.It thatvolumeisactuallystored. is possible to convert from one implementation to another To provide this global visibility of metadata we define easily,withoutaffectinganyexternalreferences. acentralmetadatamanagemententity,namedtheUniversal It is important to emphasize that Volumes are merely Resolver. The Universal Resolver contains all the metadata containers which provide storage for segments. There is no about the AFF4 universe, that is to say it is able to resolve restrictionofwhichsegmentscanbestoredbyanyparticular queriesforanyattributeaboutanyURNintheuniverse. volume.Forexample,thesegmentswhichmakeupasingle Although the resolver has complete visibility of all attri- Imagestreammaybestoredinanumberofvolumes(splitting butes, it is still useful to store metadata within the volume theimageinsomewayamongthem).Similarly,thesegments itself,particularlydatapertainingtothevolumeitself.Ifwe representinganumberofstreamsmaybestoredinthesame didnotstorethemetadatawithinthevolumeitself,thenthe volume. volumewouldnotbeaccessibletoimplementationswhichdo nothavethismetadata. 5.1. Directoryvolumes Tothisendwedefineawayforserializingmetadatastate- ments(ortuples)intoastandardformatwhichimplementations TheDirectoryVolumeisthesimplesttypeofvolume.Itsimply can load into their respective resolvers when parsing the stores different segments based on their URNs in a single volume. Relations can be stored in segments having a URN directory. Since some filesystems are unable to represent endingwith‘‘properties’’.TheAFF4implementationloadsthese URNsaccurately(e.g.Windowshasmanylimitationsonthe segmentsautomaticallyintotheUniversalResolver. types of characters allowed for a filename), the Directory Relationsarestoredwithinthepropertiessegmentoneper Volume encodes URNs according to RFC1738 (Berners-Lee line,withthesubjectURN(encodedaccordingtoRFC1737),fol- et al., 1994); non-printable characters are escaped with a % lowedbywhitespaceandtheattributename.Thisisthenfol- followedbytheASCIIordinalofthecharacter. lowedbytheequalsignandtheUTF8encodingofthevalue.An The Directory Volume uses the aff4:stored attribute to examplepropertiesfileforanImageStreamisshowninListing1. provide a base URL. The URL for each segment is then con- Itisimportanttostressthatthepropertiesfileissimply structedbyappendingtheescapedsegmentURNtothebase a serialization of statements into volume segments. The URL.NotethatthereisnorestrictiononwhattypeofURLthis digital investigation 6 (2009) S57–S68 S61 Listing 1 ExamplepropertiesfilesforseveralAFF4objects(URNsareshortenedforillustration). DirectoryVolume: urn:aff4:f901be8e-d4b2aff4:stored¼http://../case1/ urn:aff4:f901be8e-d4b2aff4:type¼directory ZipFileVolume: urn:aff4:98a6dad6-4918aff4:stored¼file:///file.zip urn:aff4:98a6dad6-4918aff4:type¼zip ImageStream: urn:aff4:83a3d6db-85d5aff4: stored¼urn:aff4:f901be8e-d4b2 urn:aff4:83a3d6db-85d5aff4:chunks_in_segment¼256 urn:aff4:83a3d6db-85d5aff4:chunk_size¼32k urn:aff4:83a3d6db-85d5aff4:type¼image urn:aff4:83a3d6db-85d5aff4:size¼5242880 MapStream: urn:aff4:ed8f1e7a-94aaaff4:target_period¼3 urn:aff4:ed8f1e7a-94aaaff4:image_period¼6 urn:aff4:ed8f1e7a-94aaaff4:blocksize¼64k urn:aff4:ed8f1e7a-94aaaff4: stored¼urn:aff4:83a3d6db-85d5 urn:aff4:ed8f1e7a-94aaaff4:type¼map urn:aff4:ed8f1e7a-94aaaff4:size¼0xA00000 LinkObject: mapaff4:target¼urn:aff4:ed8f1e7a-94aa mapaff4:type¼link IdentityObject: urn:aff4:identity/41:13aff4:common_name¼/C¼US/ST¼CA/L¼SanFrancisco/O¼Fort-Funston/CN¼client1/ emailAddress¼[email protected] urn:aff4:identity/41:13aff4:type¼identity urn:aff4:identity/41:13aff4:statement¼00000000 urn:aff4:identity/41:13aff4:x509¼urn:aff4:identity/41:13/cert.pem canbe,soitmaybealocationonafilesystem(e.g.file:///some/ (cid:2) Zip64 libraries are readily available making proprietary directory/) or a location on a HTTP server (e.g. http://intranet. implementations of interfaces to the AFF4 volume format server/some/path).Inthiswayitspossibletomovetheentire simpletowrite.Forexample,asimplepythonprogramto volumefromafilesystemtoawebservertransparently. dump out an Image stream (Section 6.1) is illustrated in The Directory Volume stores its own URN in a special Listing2. segmentnamed‘‘__URN__’’atthebaseofthedirectory. Fig.2showsthebasicstructureofaZiparchive.Ascanbe 5.2. Zip64volumes seen,thearchiveconsistsofaCentralDirectory(CD)locatedatthe endofthearchive.TheCDisalistofpointerstoindividualFile ForAFF4,wehavechangedthedefaultvolumecontainerfile headerstructureslocatedwithinthebodyofthearchive.Headers formattoZip64(Katz,2007).Therearemanyreasonsforthis arethenfollowedbythefiledata,afterithasbeencompressed decision: by the appropriate compression method (as specified in the header). Each archived file is optionally followed by a Data (cid:2) ThereisalreadywidesupportfortheZipandZip64formats. Descriptor describing the length and CRC of the archived file. Bymigratingtotheseformats,wecantakeadvantageofthe Usingthedatadescriptorfieldallowsimplementationstowrite richnumberofuseranddevelopertoolsalreadyavailable. archiveswithoutneedingtoseekintheoutputfile.Thisallows The volume may be inspected using any number of Zipfilestobewrittentopipesforexample,sendinganimage commercial or open source zip application (e.g. Windows overthenetworkusingnetcatorssh.AFF4alwaysusesthedata ExplorernativelysupportsZipfilesascanbeseeninFig.3, descriptionheadertoensurevolumesarewrittencontinuously Zip64issupportednativelybyJava,PythonandPERL). withoutneedingtoseekintheoutputfile. S62 digital investigation 6 (2009) S57–S68 Listing 2 SamplePythoncodetodumpoutanImageStream.Ascanbeseenthechunkindexsegmentisusedtoslicethedatasegmentintochunks. Thechunksaredecompressedandwrittentotheoutputfile. volume¼zipfile.ZipFile(INPUT_FILE) outfd¼open(OUTPUT_FILE,‘w’) count¼0 while1: idx_segment¼volume.read(STREAMþ‘‘/%08d.idx’’%count) bevy¼volume.read(STREAMþ‘‘/%08d’’count) indexes¼struct.unpack(‘‘<’’þ‘‘L’’* (len(idx_segment)/4),idx_segment) foriinrange(len(indexes)-1): chunk¼bevy[indexes[i]:indexes[iþ1]] outfd.write(zlib.decompress(chunk)) countþ¼1 It is important to note that AFF4 only requires that the important for a forensic user. Instead, we use AFF4’s digital volume be capable of storing multiple named segments of signaturefacilitiesforintegrityandnon-repudiation,andwe data. Although our AFF4 implementation uses the Zip64 file introduce a new stream based encryption scheme for formatasanunderlyingstoragemechanism,oursystemalso ensuringdataprivacy(Section6.4). supports legacy AFF1 volumes as well as Expert Witness AlthoughtherearenumerousZipimplementationsavail- Evidencefiles(Kloetetal.,2008). abletoday,wehavecreatedourownimplementation.There We ignore Zip64’s built-in support for splitting archives aremanyreasonstodevelopourownZip64implementation into multiple Zip files. Instead, our implementation treats forAFF4: eachvolumeasacompleteandstand-aloneZipfile.TheAFF4 implementation then considers the segments contained (cid:2) ThecommonlyavailableZipimplementationswritteninC withinasbelongingtotheuniversalcollection.Thisprovides donot implement the Zip64extensions. These extensions theabilitytosplitastreamacrossvolumesautomatically,as arerequiredtosupportEvidenceVolumeslargerthan2GB. differentsegmentswithinthesamestreammaybestoredin (cid:2) Simple Zip implementations might rescan the Central differentvolumes. Directoryforeachsegmentrequest.Sinceinpracticethere Zip64 also defines encryption and authentication exten- canbealargenumberofsegmentsinavolume,itisadvisable sions.Wedonotusethemduetotherestrictionsimposedon tohaveaZip64implementationthatisoptimizedtostoring their use and because they lack the functionality that is thousands(orevenhundredsofthousands)ofsegmentsinan efficientdatastructure.Infactourimplementationusesthe UniversalResolveritselftostoretheparsedcentraldirectory information,whichmeansthatinmostcaseswedonoteven needtoscantheCentralDirectoryatall. (cid:2) While the Zip specification duplicates data found in the CentralDirectoryentryineachFileHeader(suchasfilename, size,CRCetc),manyimplementationsthatwehaveexamined onlypopulatethisinformationinoneoftheseplaces.Inthe interestofrobustness,wewantedtoensurethatdatastored inbothlocationswouldbepopulatedtoallowrecoveryofat leastsomeevidencethatmightexistindamagedvolumes.If thecentraldirectoryislost,itispossibletoscanthroughthe volume, and locate all the Zip64 file headers. Then it is possibletorepairandreconstructthecentraldirectory. (cid:2) Our implementation supports simultaneous access by multiplereadersandwriters.Sinceoursystemrequiresall metadatatobesharedthroughtheUniversalResolver,this Fig.2–ThebasicstructureofaZiparchive.Alsoshownis lendsitselftoprovidingUniversalLockingonaperObject hownewarchivemembersareaddedtoanexistingZip basis. So for example, if one process wants to add a new File.TheCentralDirectoryisoverwrittenbythenew segmentintoaZipvolume,theycanlockitviatheResolver, member,andanewCentralDirectoryiswrittenontheend. add the segment and unlock the volume object in the digital investigation 6 (2009) S57–S68 S63 Fig.3–AnImagestreambrowsedfromWindowsExplorer. Fig.4–ThestructureofImageStreamBevies.Eachbevyis Basicaccesstotheevidencevolumecanbemadeusing acollectionofcompressedchunksstoredbacktoback. familiartoolsimprovingtransparency. Relativechunkoffsetsarestoredinthechunkindex segment. resolver, stopping concurrent access by other programs, evenondifferentmachines. bevy. Each chunk is compressed individually using the zlib compressalgorithm.Thisgeneralstructureofstoringchunks withinlargersegmentsissimilartothetechniqueusedbythe 6. Streams ExpertWitnessfileformat(EWF)usedbyEnCase(Keightley, 2003)andimplementedbytheopensourcelibewf(Kloetetal., 2008)package.ThisimprovementfromAFF1’s16MBsegment The Stream system provides random access to an abstract sizeresultsinabettermatchbetweenrequestedsizeandthe representationofabodyofdata.Ourimplementationallows minimumsizerequiredfordecompression.Lessdataisneeded the segments in a stream to be operated on as if they were tobedecompressedunnecessarilywherereadingsmallsectors a single file by supporting the traditional POSIX-like func- tionalityofopen(),seek(),write(),andread().Allstreams randomly,leadingtovastperformanceimprovements. alsohavea‘‘size’’attributetodenotethelastbyteaddress- 6.2. Themapstream ablewithinthestream.Thisisrequiredinordertosupportthe POSIX whence attributewhichmay requireseeking from the Linear transformations of data are commonplace in forensic endofthestream. analysis.Forexample,afileisoftensimplyacollectionofbytes The following sections describe a number of types of drawnfromanimage,whileaTCP/IPstreamissimplyacollec- streams. It is important to note that clients of our imple- tionofpayloadsfromselectednetworkpackets.Sometimesthe mentation do not care how a particular stream is imple- samedatamaybeviewedinanumberofwaysdforexample mented. Streams are opened by their URNs, and the library aVirtualAddressSpaceisamappingofthePhysicalAddress itself ensures they provide the Stream interface. So for Spacethroughapagetabletransformation(Tanenbaum,2008). example,usersdonotcareifastreamisaMapStreamoran Zerostoragecarving(2006)isawayofspecifyingcarvedfilesin ImageStreamdtheinterfaceprovidedisthesame. terms of a sequence of blocks taken from the image; Cohen extendedthisconcepttoanarbitrarymappingfunction(Cohen, 6.1. Theimagestream 2008b,2007)whichcanbeusedtodescribearbitrarymappings ofcarvedfileswithinasingleimage. TheAFF4ImageStreamstoresasingleread-onlyforensicdata In this work we extendthemapping function concept to set. For example, this stream might contains a hard disk allowasinglemaptodrawdatafromarbitrarystreams(called image, a memory image or a network capture (in PCAP targets).ThistransformisimplementedviatheMapstream. format).Imagestreamshaveanaff4:typeattributeofimage. Themappingfunctionisdescribedinasegmentnamedby Storage for the data is done by using multiple data appending ‘‘/map’’ to the stream URN. The segment data segmentsstoredonvariousvolumes.DatasegmentURNsare consistsofaseriesoflines,eachcontainingastreamoffset, derivedbyappendingan8digit,zeropaddeddecimalinteger a target offset and a target URN. Offsets are encoded using representation of an incrementing id to the stream URN decimalnotation. (e.g.‘‘urn:aff4:83a3d6db-85d5/00000032’’).Eachdatasegment Denotingthestreamoffsetbyx,andthetargetoffsetbyy, is called a bevy and stores a number of compressed chunks theMapspecifiesasetofpoints(X,Y,T).Readrequestsfor backtoback. i i i a byte at a mapped stream offset x can then be satisfied by Thechunkindexsegmentisasegmentcontainingalistof readingabytefromtargetT atoffsetygivenby: relative offsets to the beginning of each chunk within the i bevy.ThechunkindexsegmentURNisderivedbyappending y¼ðx(cid:3)XÞþY cx˛½X;X Þ (1) thebevyURNwith‘‘.idx’’.ThisisillustratedinFig.4. i i i iþ1 Image streams specify the chunk_size attribute, as the Forexample,considerthefollowingmap: number of image bytes each chunk contains (chunk size defaultsto32kb).Alsospecifiedisthechunks_per_segment To read this stream we satisfy read requests of offsets attributewhichspecifieshowmanychunksarestoredineach between 0 and 4095 in the stream from offset 0 to offset S64 digital investigation 6 (2009) S57–S68 4095 in urn:aff4:83a3d6db-85d5. Requests for bytes between andoftendifferentaccesslevelsaredesired.Forexample,for 4096 and 8191 are fetched from urn:aff4:f901be8e-d4b2 from evidence set containing both network captures and disk offset10000.Finallybytesafter8192(untilthespecifiedsize imagesitmaybedesirabletolimitaccesstostreamsbasedon of the stream) are fetched from offset 5000 in urn:af- legalauthorizations,eventhoughthesamesetisdistributed f4:83a3d6db-85d5. toanumberofpeople. Inordertoefficientlyexpressperiodicmapssuchasthose AlthoughtheZip64standardspecifiesencryption,itisnot foundinRAIDarrays,theMapstreammaybeprovidedwith suitableforourpurposessinceitencryptseachsegmentsepa- twooptionalparameters:atarget_period(T ),andstream_period rately,anddoesnotspecifyasufficientlyflexiblescheme(e.g. p (S ).Ifspecified,theaboverelationbecomes: supportforPKIorPGPkeys).Segmentbasedencryptionmay p leadtoinformationleakagewhensegmentsarecompressed,as (cid:2) (cid:3) p:¼floor x theuncompressedsizeofthesegmentmaybededuced. x0:¼mod(cid:4)xSp;S (cid:5) AFF4 therefore introduces a new encryption scheme, the p y:¼ðx0(cid:3)XiÞþYiþp(cid:4)Tp EncryptedStream.TheEncryptedStreamprovidestransparent encryption and decryption onto a single target stream. The Wheremodisthemodulusfunctionandfloorsignifiesinteger target stream actually stores the encrypted data, and read division.ForexampleconsiderListing3,whichcorrespondsto requestsfromthestreamaresatisfiedbydecryptingtherelevant a3diskRAID-5array. datafromthisbackingstream.Theencryptedstreamitselfdoes notstoreanydataatalldalldataisstoredonitstargetstream. 6.3. TheHTTPstream The Encrypted Stream may contain any data at all, includingdiskimages,networkcapturesormemoryimages.It Arguably the most ubiquitous protocol for information isusefulhowever,tostoreanentireAFF4volumewithinthe sharing is the HTTP protocol (Fielding et al., 1999). The Encryptedstream.Thisprovidesblocklevelencryptionforthe protocolfeaturesmatureauthenticationandauditingandis contained AFF4 volume (which might contain arbitrary fast and easy to set up with numerous web server imple- streams).ThisapproachisillustratedinFig.5. mentationsavailableonthemarket.TheHTTPprotocolisalso TheresultisthatanumberofAFF4volumesareusedas designedtooperateacrossawiderangeofnetworkarchitec- Container Volumes to provide storage for Encrypted Streams. tures and is therefore more deployable than traditional file The main Embedded Volume, which actually contains data is sharingprotocols. stored within the Encrypted Stream, effectively distributed ForthesereasonsitisdesirabletoallowtheHTTPprotocol throughout the container volumes. Note that the outer tobeusedinfacilitatingthesharingofevidencefilesbetween VolumemaycontainseveralEncryptedStreamsandtherefore investigators.Luckily,theHTTPprotocolfitsnaturallywithin contain multiple AFF4 Encrypted Volumes. Container the URN based scheme adopted by AFF4, since the HTTP Volumes may contain non-encrypted streams as well, and Universal Resource Locator (URL) scheme is a subset of the may implement different encryption schemes and keys for URNscheme. each Encrypted stream. This effectively allows arbitrary For this reason, URLs may be used interchangeably with accesspoliciestobeimplementedasonlyvolumeswhichcan aURNwithintheAFF4universe.Forexample,theaff4:stored beaccessedcanberead. attribute of a volume may be specified as a URL (e.g. http:// intranet/123453/). AFF4 provides transparent support for HTTP and FTP URLs by means of the Curl HTTP library (Various, 2009). The HTTP Stream, therefore satisfies read requestsbymakingHTTPrequeststothewebserver.Weuse the Content-Range HTTP header to request exactly the byte rangetheclientisinterestedin.Thisallowsefficientnetwork transportaswedonotneedtodownloadunnecessarydata, wejustrequestthosechunkstheclientapplicationrequires. OurimplementationalsoenablesdirectwritingtoaHTTP URL using the WebDav extensions to HTTP (Goland et al., 1999). The HTTP stream also supports the File Transfer Protocol(FTP)andHTTPS(SecureSocketsLayerdSSL)proto- colstransparently,asprovidedbytheCurllibrary. 6.4. Encryptedstreams Encryptionisanimportantpropertyinanevidencefileformat. Fig.5–EmbeddinganencryptedAFF4volumewithinan Inparticular,multiplestreamsmaybepresentinthefileset, EncryptedStream.Thecontainervolumecontainsan encryptedstreambackedbyanimagestreamwhichisalso 0,0,urn:aff4:83a3d6db-85d5 storedinthecontainer.Oncetheencryptedstreamis 4096,10000,urn:aff4:f901be8e-d4b2 opened,thevolumestoredonitsimagestreamis 8192,5000,urn:aff4:83a3d6db-85d5 accessible.Nowitispossibletoseethesecretimage streamstoredwithinthevolume. digital investigation 6 (2009) S57–S68 S65 6.5. Thelinkobject Toverifythesignatures,theAFF4libraryloadsthestored certificate, then checks the signature for each statement. If AlthoughtheURNofastreamnamesitunambiguouslyinthe astatementisverified(i.e.deemedascorrectaccordingtothe AFF4universeitisdifficulttouseandcommunicateduetoits identity), the relations within it are checked. Note that it is random nature. Most investigators would prefer to use possibleformultipleidentitiestosignthesamedata. ashorternamewhichmightwellrepresenttheimagebetter intheirminds(e.g.acasenameorwarrantnumber). A Link object has a aff4:target attribute. When the Link 8. Usage scenarios object is opened, the object named by this attribute is returned. This allows images with complex names to be InthissectionwedescribehowAFF4maybeusedinvarious referred to via short, meaningful names. In practice both situations. Since the AFF4 framework implements a distrib- ImageStreamsandLinkObjectsareautomaticallycreatedby utedevidencemanagementsystem,wedemonstrateitsuse imagingtools,souserscanalwaysrefertotheImagestream by a fictitious multinational corporation with offices in Los viathesimplifiedLinkname. Angeles and New York. Each office has its own computer forensicslabandisconnectedviaaWAN. 8.1. Usingdistributedevidence 7. Identity object AninvestigationisconductedbytheNewYorkteam.Thecase AFF4 defines a Statement as a collection of relations, or relatestoaharddiskImagestreamstoredinsideavolume,in (subject,attribute,value)tuples.Listing4illustratesacollec- turnstoredontheNYevidenceserveratURLhttp://ny.wan/ tion of relations encoded in the standard AFF4 notation evidence1.aff4. The team requires an analyst (Bob) in LA to (SHA254hashesarebase64encoded). assist with their analysis. The LA analyst types2: This ThestatementexpressesasetofattributesofotherAFF4 commandcausesthelocalAFF4implementationto: objects, and in particular the attribute of SHA256 hash is expressed(butotherattributesmayalsobeexpressed). 1) Contacttheuniversalresolveraskingwhere‘‘NYcase1’’is Digitalsignatureshavebeenusedinpreviousforensicfile stored. formats (such as AFF1) to provide authentication and non- 2) Theuniversalresolverrepliesthatitisasymboliclinkto repudiationofforensicevidence.Inessence,awhenaperson a stream called ‘‘urn:aff4:1234’’ stored within the volume signs an object they are vouching for its authenticity. Simi- ‘‘urn:aff4:9876’’. Further queries reveal that the volume is larly,whenapersonsignsaStatement,theyarevouchingfor locatedathttp://ny.wan/evidence1.aff4. itsauthenticity.ThisconceptissimilartotheBillofMaterial 3) ThelocalAFF4librarythendirectlyaccessesthevolumeat (BOM)fromAFF1. thegivenURL. Notethatthe entirevolumeis notcopied, An AFF4 Identity object represents an entity, currently instead specific chunks are retrieved on an as needed describedbywayofanX509certificate.TheURNofanidentity basis. objectisthecertificate’sfingerprint,andisthereforeuniqueto the certificate. Identity objects contain aff4:statement attri- TheoveralleffectisthattheuserinLAisabletodirectly buteswhichrefertoAFF4streamscontainingstatements.The access the disk image specified using a friendly name, and identityobjectalsocontainsacopyofthecertificateusedto storedataremotelocationeasily. signthestatements. 8.2. Loadredistribution Listing 3 In the previous scenario, Bob becomes involved in this case, and wishes to download the entire image locally to A Mapstream that correspondsto a3 disk RAID-5 array. The http://la.wan/evidence1.aff4. The Universal Resolver now targetsareURNsfortherespectivedisks.Notethatmapcoor- dinatesaregiveninmultiplesofblocksize. has two possible locations for the same volume URN, since there are two copies in existence. Based on pre- determined distance metrics, the resolver directs requests from Bob to the LA copy, while Alice is redirected to the aff4:block_size¼64k NY copy. This load redistribution can be used for optimal aff4:stream_period¼6 management of evidence storage in a transparent way. aff4:target_period¼3 Analysts are not aware of where the evidence is physically 0,0,disk1 stored, and it appears as though all evidence is always 1,0,disk0 available. 2,1,disk2 IfAlice’slocalNYcopyisnowlost,Alice’slocalAFF4library 3,1,disk1 willfailtoopentheNYURL,andwillautomaticallyfallbackto 4,2,disk0 thecopystoredinLA.ThiswillrequireaccessacrosstheWAN, 5,2,disk2 2FlsisthefilelistingcommandwhichispartoftheSleuthkit.