LinköpingStudiesinScienceandTechnology ThesisNo.1361 Completing the Picture — Fragments and Back Again by Martin Karresand SubmittedtoLinköpingInstituteofTechnologyatLinköpingUniversityinpartial fulfilmentoftherequirementsforthedegreeofLicentiateofEngineering DepartmentofComputerandInformationScience Linköpingsuniversitet SE-58183Linköping,Sweden Linköping2008 Completing the Picture — Fragments and Back Again by MartinKarresand May2008 ISBN978-91-7393-915-7 LinköpingStudiesinScienceandTechnology ThesisNo.1361 ISSN0280–7971 LiU–Tek–Lic–2008:19 ABSTRACT Bettermethodsandtoolsareneededinthefightagainstchildpornography. Thisthesispresentsa methodforfiletypecategorisationofunknowndatafragments,amethodforreassemblyofJPEG fragments,andtherequirementsputonanartificialJPEGheaderforviewingreassembledimages. Toenableempiricalevaluationofthemethodsanumberoftoolsbasedonthemethodshavebeen implemented. ThefiletypecategorisationmethodidentifiesJPEGfragmentswithadetectionrateof100%anda falsepositivesrateof0.1%.Themethodusesthreealgorithms,ByteFrequencyDistribution(BFD), RateofChange(RoC),and2-grams.Thealgorithmsaredesignedfordifferentsituations,depending ontherequirementsathand. Thereconnectionmethodcorrectlyreconnects97%ofaRestart(RST)markerenabledJPEGimage, fragmentedinto4KiBlargepieces. Whendealingwithfragmentsfromseveralimagesatonce,the methodisabletocorrectlyconnect70%ofthefragmentsatthefirstiteration. TwoparametersinaJPEGheaderarecrucialtothequalityoftheimage;thesizeoftheimageand thesamplingfactor(actuallyfactors)oftheimage. Thesizecanbefoundusingbruteforceandthe samplingfactorsonlytakeonthreedifferentvalues. HenceitispossibletouseanartificialJPEG headertoviewfullofpartsofanimage. TheonlyrequirementisthatthefragmentscontainRST markers. Theresultsoftheevaluationsofthemethodsshowthatitispossibletofind,reassemble,andview JPEGimagefragmentswithhighcertainty. ThisworkhasbeensupportedbyTheSwedishDefenceResearchAgencyandtheSwedishArmedForces. DepartmentofComputerandInformationScience Linköpingsuniversitet SE-58183Linköping,Sweden Acknowledgements This licentiate thesis would not have been written without the invaluable sup- portofmysupervisorProfessorNahidShahmehri. Iwouldliketothankherfor keepingmeandmyresearchontrackandhavingfaithinmewhenthegoinghas been tough. She is a good role model and always gives me support, encourage- ment,andinspirationtobringmyresearchforward. Many thanks go to Helena A, Jocke, Jonas, uncle Lars, Limpan, Micke F, MickeW,Mirko, andMårten. Withouthesitationyouletmeintoyourhomes throughthelensesofyourcameras. Ifapictureisworthathousandwords,Iowe yourmorethanninemillions! IalsoowealotofwordstoBrittanyShahmehri. Herpromptandthoroughproof-readinghasindeedincreasedthereadabilityof mythesis. I would also like to thank my colleagues at the Swedish Defence Research Agency(FOI),myfriendsattheNationalLaboratoryofForensicScience(SKL) andtheNationalCriminalInvestigationDepartment(RKP),andmyfellowPhD studentsattheLaboratoryforIntelligentInformationSystems(IISLAB)andthe DivisionforDatabaseandInformationTechniques(ADIT). Youinspiredmeto embarkonthisjourney. Thankyouall,youknowwhoyouare! AndlastbutnotleastIwouldliketothankmybelovedwifeHelenaandour lovelynewborndaughter. Youbringhappinessandjoytomylife. FinallyIacknowledgethefinancialsupportbyFOIandtheSwedishArmed Forces. MartinKarresand Linköping,14th April2008 Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 OutlineofMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 OutlineofThesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 IdentifyingFragmentTypes 9 2.1 CommonAlgorithmicFeatures . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Centroid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Lengthofdataatoms . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.3 MeasuringDistance . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 ByteFrequencyDistribution . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 RateofChange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 2-Grams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5.1 MicrosoftWindowsPEfiles. . . . . . . . . . . . . . . . . . . . 25 2.5.2 Encryptedfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5.3 JPEGfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.4 MP3files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.5 Zipfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.6 Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6.1 MicrosoftWindowsPEfiles. . . . . . . . . . . . . . . . . . . . 32 2.6.2 Encryptedfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.6.3 JPEGfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.6.4 MP3files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.5 Zipfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.6 Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 i 3 PuttingFragmentsTogether 43 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 ParametersUsed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.2 Correctdecoding. . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3.3 Non-zerofrequencyvalues . . . . . . . . . . . . . . . . . . . . 50 3.3.4 LuminanceDCvaluechains . . . . . . . . . . . . . . . . . . . 51 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4.1 Singleimagereconnection . . . . . . . . . . . . . . . . . . . . . 53 3.4.2 Multipleimagereconnection . . . . . . . . . . . . . . . . . . . 53 3.5 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5.1 Singleimagereconnection . . . . . . . . . . . . . . . . . . . . . 54 3.5.2 Multipleimagereconnection . . . . . . . . . . . . . . . . . . . 57 4 ViewingDamagedJPEGImages 59 4.1 StartofFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 DefineQuantizationTable . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3 DefineHuffmanTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.4 DefineRestartInterval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 StartofScan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 CombinedErrors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.7 UsinganArtificialJPEGHeader . . . . . . . . . . . . . . . . . . . . . 75 4.8 ViewingFragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5 Discussion 79 5.1 FileTypeCategorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 FragmentReconnection . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3 ViewingFragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6 RelatedWork 85 7 FutureWork 93 7.1 TheFileTypeCategorisationMethod . . . . . . . . . . . . . . . . . . 94 7.2 TheImageFragmentReconnectionMethod . . . . . . . . . . . . . . 95 7.3 ArtificialJPEGHeader . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Bibliography 97 A Acronyms 103 B HardDiskAllocationStrategies 105 C ConfusionMatrices 107 ii List of Figures 2.1 Bytefrequencydistributionof.exe . . . . . . . . . . . . . . . . . . . . 13 2.2 BytefrequencydistributionofGPG . . . . . . . . . . . . . . . . . . . 13 2.3 BytefrequencydistributionofJPEGwithRST . . . . . . . . . . . . 14 2.4 BytefrequencydistributionofJPEGwithoutRST . . . . . . . . . . 15 2.5 BytefrequencydistributionofMP3. . . . . . . . . . . . . . . . . . . . 15 2.6 BytefrequencydistributionofZip . . . . . . . . . . . . . . . . . . . . 16 2.7 RateofChangefrequencydistributionfor.exe . . . . . . . . . . . . 18 2.8 RateofChangefrequencydistributionforGPG . . . . . . . . . . . 18 2.9 RateofChangefrequencydistributionforJPEGwithRST . . . . 19 2.10 RateofChangefrequencydistributionforMP3. . . . . . . . . . . . 20 2.11 RateofChangefrequencydistributionforZip . . . . . . . . . . . . 20 2.12 2-gramfrequencydistributionfor.exe . . . . . . . . . . . . . . . . . . 22 2.13 BytefrequencydistributionofGPGwithCAST5 . . . . . . . . . . 25 2.14 ROCcurvesforWindowsPEfiles. . . . . . . . . . . . . . . . . . . . . 33 2.15 ROCcurvesforanAESencryptedfile . . . . . . . . . . . . . . . . . . 34 2.16 ROCcurvesforfilesJPEGwithoutRST . . . . . . . . . . . . . . . . 34 2.17 ROCcurvesforJPEGwithoutRST;2-gramalgorithm . . . . . . . 35 2.18 ROCcurvesforfilesJPEGwithRST. . . . . . . . . . . . . . . . . . . 36 2.19 ROCcurvesforMP3files . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.20 ROCcurvesforMP3files;0.5%falsepositives. . . . . . . . . . . . . 38 2.21 ROCcurvesforZipfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.22 Contourplotfora2-gramZipfilecentroid . . . . . . . . . . . . . . . 40 3.1 Thefrequencydomainofadataunit . . . . . . . . . . . . . . . . . . . 45 3.2 Thezig-zagorderingofadataunittraversal . . . . . . . . . . . . . . 46 3.3 Thescanpartbinaryformatcoding. . . . . . . . . . . . . . . . . . . . 49 4.1 Theoriginalundamagedimage . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 TheStartOfFrame(SOF)markersegment. . . . . . . . . . . . . . . 60 4.3 Quantizationtableswithswappedsamplerate . . . . . . . . . . . . . 62 4.4 Luminancetablewithhighsamplerate. . . . . . . . . . . . . . . . . . 62 4.5 Luminancetablewithlowsamplerate . . . . . . . . . . . . . . . . . . 64 4.6 Swappedchrominancecomponentidentifiers . . . . . . . . . . . . . 64 4.7 Swappedluminanceandchrominancecomponentidentifiers . . . 65 4.8 Moderatelywrongimagewidth . . . . . . . . . . . . . . . . . . . . . . 65 iii 4.9 TheDefineQuantizationTable(DQT)markersegment . . . . . . 66 4.10 LuminanceDCcomponentsetto0xFF . . . . . . . . . . . . . . . . . 68 4.11 ChrominanceDCcomponentsetto0xFF . . . . . . . . . . . . . . . 68 4.12 TheDefineHuffmanTable(DHT)markersegment . . . . . . . . . 69 4.13 ImagewithforeignHuffmantablesdefinition . . . . . . . . . . . . . 71 4.14 TheDefineRestartInterval(DRI)markersegment. . . . . . . . . . 71 4.15 Shortrestartintervalsetting. . . . . . . . . . . . . . . . . . . . . . . . . 71 4.16 TheStartOfScan(SOS)markersegment . . . . . . . . . . . . . . . . 72 4.17 LuminanceDCHuffmantablesettochrominanceditto . . . . . . 74 4.18 CompleteexchangeofHuffmantablepointers . . . . . . . . . . . . 74 4.19 Acorrectsequenceoffragments . . . . . . . . . . . . . . . . . . . . . . 78 4.20 Anincorrectsequenceoffragments . . . . . . . . . . . . . . . . . . . . 78 5.1 Possiblefragmentparts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 iv
Description: