ebook img

Compact Data Structures: A Practical Approach PDF

567 Pages·2016·4.869 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Compact Data Structures: A Practical Approach

Compact Data Structures A Practical Approach Gonzalo Navarro DepartmentofComputerScience, UniversityofChile OneLibertyPlaza,20thFloor,NewYork,NY10006,USA CambridgeUniversityPressispartoftheUniversityofCambridge. www.cambridge.org Informationonthistitle:www.cambridge.org/9781107152380 ©GonzaloNavarro2016 Firstpublished2016 PrintedintheUnitedStatesofAmericabySheridanBooks,Inc. AcataloguerecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloging-in-PublicationData Names:Navarro,Gonzalo,1969–author. Title:Compactdatastructures:apracticalapproach/GonzaloNavarro, UniversidaddeChile. Description:NewYork,NY:UniversityofCambridge,[2016]|Includes bibliographicalreferencesandindex. Identifiers:LCCN2016023641|ISBN9781107152380(hardback:alk.paper) Subjects:LCSH:Datastructures(Computerscience)|Computeralgorithms. Classification:LCCQA76.9.D35N382016|DDC005.7/3–dc23 LCrecordavailableathttps://lccn.loc.gov/2016023641 ISBN978-1-107-15238-0Hardback Contents ListofAlgorithms pagexiii Foreword xvii Acknowledgments xix 1 Introduction 1 1.1 WhyCompactDataStructures? 1 1.2 WhyThisBook? 3 1.3 Organization 4 1.4 SoftwareResources 6 1.5 MathematicsandNotation 7 1.6 BibliographicNotes 10 2 EntropyandCoding 14 2.1 Worst-CaseEntropy 14 2.2 ShannonEntropy 16 2.3 EmpiricalEntropy 17 2.3.1 BitSequences 18 2.3.2 SequencesofSymbols 20 2.4 High-OrderEntropy 21 2.5 Coding 22 2.6 HuffmanCodes 25 2.6.1 Construction 25 2.6.2 EncodingandDecoding 26 2.6.3 CanonicalHuffmanCodes 27 2.6.4 BetterthanHuffman 30 2.7 Variable-LengthCodesforIntegers 30 2.8 Jensen’sInequality 33 2.9 Application:PositionalInvertedIndexes 35 2.10 Summary 36 2.11 BibliographicNotes 36 3 Arrays 39 3.1 ElementsofFixedSize 40 3.2 ElementsofVariableSize 45 3.2.1 SampledPointers 46 3.2.2 DensePointers 47 3.3 PartialSums 48 3.4 Applications 49 3.4.1 Constant-TimeArrayInitialization 49 3.4.2 DirectAccessCodes 53 3.4.3 Elias-FanoCodes 57 3.4.4 DifferentialEncodingsandInvertedIndexes 59 3.4.5 CompressedTextCollections 59 3.5 Summary 61 3.6 BibliographicNotes 61 4 Bitvectors 64 4.1 Access 65 4.1.1 Zero-OrderCompression 65 4.1.2 High-OrderCompression 71 4.2 Rank 73 4.2.1 SparseSampling 73 4.2.2 ConstantTime 74 4.2.3 RankonCompressedBitvectors 76 4.3 Select 78 4.3.1 ASimpleHeuristic 78 4.3.2 AnO(loglogn)TimeSolution 80 4.3.3 ConstantTime 81 4.4 VerySparseBitvectors 82 4.4.1 Constant-TimeSelect 83 4.4.2 SolvingRank 83 4.4.3 BitvectorswithRuns 86 4.5 Applications 87 4.5.1 PartialSumsRevisited 87 4.5.2 PredecessorsandSuccessors 89 4.5.3 Dictionaries,Sets,andHashing 91 4.6 Summary 98 4.7 BibliographicNotes 98 5 Permutations 103 5.1 InversePermutations 103 5.2 PowersofPermutations 106 5.3 CompressiblePermutations 108 5.4 Applications 115 5.4.1 Two-DimensionalPoints 115 5.4.2 InvertedIndexesRevisited 116 5.5 Summary 117 5.6 BibliographicNotes 117 6 Sequences 120 6.1 UsingPermutations 121 6.1.1 Chunk-LevelGranularity 121 6.1.2 OperationswithinaChunk 123 6.1.3 Construction 126 6.1.4 SpaceandTime 127 6.2 WaveletTrees 128 6.2.1 Structure 128 6.2.2 SolvingRankandSelect 132 6.2.3 Construction 134 6.2.4 CompressedWaveletTrees 136 6.2.5 WaveletMatrices 139 6.3 AlphabetPartitioning 150 6.4 Applications 155 6.4.1 CompressiblePermutationsAgain 155 6.4.2 CompressedTextCollectionsRevisited 157 6.4.3 Non-positionalInvertedIndexes 157 6.4.4 RangeQuantileQueries 159 6.4.5 RevisitingArraysofVariable-LengthCells 160 6.5 Summary 161 6.6 BibliographicNotes 162 7 Parentheses 167 7.1 ASimpleImplementation 170 7.1.1 RangeMin-MaxTrees 170 7.1.2 ForwardandBackwardSearching 175 7.1.3 RangeMinimaandMaxima 180 7.1.4 RankandSelectOperations 188 7.2 ImprovingtheComplexity 188 7.2.1 QueriesinsideBuckets 190 7.2.2 ForwardandBackwardSearching 191 7.2.3 RangeMinimaandMaxima 196 7.2.4 RankandSelectOperations 200 7.3 Multi-ParenthesisSequences 200 7.3.1 NearestMarkedAncestors 201 7.4 Applications 202 7.4.1 SuccinctRangeMinimumQueries 202 7.4.2 XMLDocuments 204 7.5 Summary 207 7.6 BibliographicNotes 207 8 Trees 211 8.1 LOUDS:ASimpleRepresentation 212 8.1.1 BinaryandCardinalTrees 219 8.2 BalancedParentheses 222 8.2.1 BinaryTreesRevisited 228 8.3 DFUDSRepresentation 233 8.3.1 CardinalTreesRevisited 240 8.4 LabeledTrees 241 8.5 Applications 245 8.5.1 RoutinginMinimumSpanningTrees 246 8.5.2 GrammarCompression 248 8.5.3 Tries 252 8.5.4 LZ78Compression 259 8.5.5 XMLandXPath 262 8.5.6 Treaps 264 8.5.7 IntegerFunctions 266 8.6 Summary 272 8.7 BibliographicNotes 272 9 Graphs 279 9.1 GeneralGraphs 281 9.1.1 UsingBitvectors 281 9.1.2 UsingSequences 281 9.1.3 UndirectedGraphs 284 9.1.4 LabeledGraphs 285 9.1.5 Construction 289 9.2 ClusteredGraphs 291 9.2.1 K2-TreeStructure 291 9.2.2 Queries 292 9.2.3 ReducingSpace 294 9.2.4 Construction 296 9.3 K-PageGraphs 296 9.3.1 One-PageGraphs 297 9.3.2 K-PageGraphs 299 9.3.3 Construction 307 9.4 PlanarGraphs 307 9.4.1 OrderlySpanningTrees 308 9.4.2 Triangulations 315 9.4.3 Construction 317 9.5 Applications 327 9.5.1 BinaryRelations 327 9.5.2 RDFDatasets 328 9.5.3 PlanarRouting 330 9.5.4 PlanarDrawings 336 9.6 Summary 338 9.7 BibliographicNotes 338 10 Grids 347 10.1 WaveletTrees 348 10.1.1 Counting 350 10.1.2 Reporting 353 10.1.3 SortedReporting 355 10.2 K2-Trees 357 10.2.1 Reporting 359 10.3 WeightedPoints 362 10.3.1 WaveletTrees 362 10.3.2 K2-Trees 365 10.4 HigherDimensions 371 10.5 Applications 372 10.5.1 DominatingPoints 372 10.5.2 GeographicInformationSystems 373 10.5.3 ObjectVisibility 377 10.5.4 Position-RestrictedSearchesonSuffixArrays 379 10.5.5 SearchingforFuzzyPatterns 380 10.5.6 IndexedSearchinginGrammar-CompressedText 382 10.6 Summary 388 10.7 BibliographicNotes 388 11 Texts 395 11.1 CompressedSuffixArrays 397 11.1.1 ReplacingAwith(cid:2) 398 11.1.2 Compressing(cid:2) 399 11.1.3 BackwardSearch 401 11.1.4 LocatingandDisplaying 403 11.2 TheFM-Index 406 11.3 High-OrderCompression 409 11.3.1 TheBurrows-WheelerTransform 409 11.3.2 High-OrderEntropy 410 11.3.3 PartitioningLintoUniformChunks 413 11.3.4 High-OrderCompressionof(cid:2) 414 11.4 Construction 415 11.4.1 SuffixArrayConstruction 415 11.4.2 BuildingtheBWT 416 11.4.3 Building(cid:2) 418 11.5 SuffixTrees 419 11.5.1 LongestCommonPrefixes 419 11.5.2 SuffixTreeOperations 420 11.5.3 ACompactRepresentation 424 11.5.4 Construction 426 11.6 Applications 429 11.6.1 FindingMaximalSubstringsofaPattern 429 11.6.2 LabeledTreesRevisited 432 11.6.3 DocumentRetrieval 438 11.6.4 XMLRetrievalRevisited 441 11.7 Summary 442 11.8 BibliographicNotes 442 12 DynamicStructures 450 12.1 Bitvectors 450 12.1.1 SolvingQueries 452 12.1.2 HandlingUpdates 452 12.1.3 CompressedBitvectors 461 12.2 ArraysandPartialSums 463 12.3 Sequences 465 12.4 Trees 467 12.4.1 LOUDSRepresentation 469 12.4.2 BPRepresentation 472 12.4.3 DFUDSRepresentation 474 12.4.4 DynamicRangeMin-MaxTrees 476 12.4.5 LabeledTrees 479 12.5 GraphsandGrids 480 12.5.1 DynamicWaveletMatrices 480 12.5.2 Dynamick2-Trees 482 12.6 Texts 485 12.6.1 Insertions 485 12.6.2 DocumentIdentifiers 486 12.6.3 Samplings 486 12.6.4 Deletions 490 12.7 MemoryAllocation 492 12.8 Summary 494 12.9 BibliographicNotes 494 13 RecentTrends 501 13.1 EncodingDataStructures 502 13.1.1 EffectiveEntropy 502 13.1.2 TheEntropyofRMQs 503 13.1.3 ExpectedEffectiveEntropy 504 13.1.4 OtherEncodingProblems 504 13.2 RepetitiveTextCollections 508 13.2.1 Lempel-ZivCompression 509 13.2.2 Lempel-ZivIndexing 513 13.2.3 FasterandLargerIndexes 516 13.2.4 CompressedSuffixArraysandTrees 519 13.3 SecondaryMemory 523 13.3.1 Bitvectors 524 13.3.2 Sequences 527 13.3.3 Trees 528 13.3.4 GridsandGraphs 530 13.3.5 Texts 534 Index 549 Algorithms 2.1 Buildingaprefixcodegiventhedesiredlengths page24 2.2 BuildingaHuffmantree 27 2.3 BuildingaCanonicalHuffmancoderepresentation 29 2.4 ReadingasymbolwithaCanonicalHuffmancode 29 2.5 Variousintegerencodings 34 3.1 Readingandwritingonbitarrays 41 3.2 Readingandwritingonfixed-lengthcellarrays 44 3.3 Manipulatinginitializablearrays 52 3.4 Readingfromadirectaccesscoderepresentation 55 3.5 Creatingdirectaccesscodesfromanarray 56 3.6 Findingoptimalpiecelengthsfordirectaccesscodes 58 3.7 Intersectionofinvertedlists 60 4.1 Encodinganddecodingbitblocksaspairs(c,o) 67 4.2 Answeringaccessoncompressedbitvectors 69 4.3 Answeringrankwithsparsesampling 74 4.4 Answeringrankwithdensesampling 75 4.5 Answeringrankoncompressedbitvectors 77 4.6 Answeringselectwithsparsesampling 80 4.7 Buildingtheselectstructures 82 4.8 Answeringselectandrankonverysparsebitvectors 85 4.9 Buildingthestructuresforverysparsebitvectors 86 4.10 Buildingaperfecthashfunction 94 5.1 Answeringπ−1withshortcuts 105 5.2 Buildingtheshortcutstructure 107 5.3 Answeringπk withthecycledecomposition 108 5.4 Answeringπ andπ−1oncompressiblepermutations 112 5.5 Buildingthecompressedpermutationrepresentation,part1 113 5.6 Buildingthecompressedpermutationrepresentation,part2 114 6.1 Answeringquerieswiththepermutation-basedstructure 125 6.2 Buildingthepermutation-basedrepresentationofasequence 126 6.3 Answeringaccessandrankwithwavelettrees 131 6.4 Answeringselectwithwavelettrees 134 6.5 Buildingawavelettree 135 6.6 Answeringaccessandrankwithwaveletmatrices 143 6.7 Answeringselectwithwaveletmatrices 144 6.8 Buildingawaveletmatrix 145 6.9 BuildingasuitableHuffmancodeforwaveletmatrices 149 6.10 BuildingawaveletmatrixfromHuffmancodes 150 6.11 Answeringquerieswithalphabetpartitioning 153 6.12 Buildingthealphabetpartitioningrepresentation 155 6.13 Answeringπ andπ−1usingsequences 156 6.14 Invertedlistintersectionusingasequencerepresentation 158 6.15 Non-positionalinvertedlistintersection 159 6.16 Solvingrangequantilequeriesonwavelettrees 161 7.1 ConvertingbetweenleafnumbersandpositionsofrmM-trees 171 7.2 BuildingtheCtableforthermM-trees 174 7.3 BuildingthermM-tree 175 7.4 Scanningablockforfwdsearch(i,d) 177 7.5 Computingfwdsearch(i,d) 178 7.6 Computingbwdsearch(i,d) 181 7.7 Scanningablockformin(i, j) 182 7.8 ComputingtheminimumexcessinB[i, j] 183 7.9 Computingmincount(i, j) 186 7.10 Computingminselect(i, j,t) 187 7.11 Computingrank (i)onB 189 10 7.12 Computingselect (j)onB 189 10 7.13 Findingthesmallestsegmentofatypecontainingaposition 202 7.14 Solvingrmq with2nparentheses 204 A 7.15 BuildingthestructureforsuccinctRMQs 205 8.1 ComputingtheordinaltreeoperationsusingLOUDS 216 8.2 Computinglca(u,v)ontheLOUDSrepresentation 217 8.3 BuildingtheLOUDSrepresentation 218 8.4 ComputingthecardinaltreeoperationsusingLOUDS 220 8.5 ComputingbasicbinarytreeoperationsusingLOUDS 221 8.6 BuildingtheBPrepresentationofanordinaltree 223 8.7 ComputingthesimpleBPoperationsonordinaltrees 225 8.8 ComputingthecomplexBPoperationsonordinaltrees 227 8.9 BuildingtheBPrepresentationofabinarytree 230 8.10 ComputingbasicbinarytreeoperationsusingBP 231 8.11 ComputingadvancedbinarytreeoperationsusingBP 234 8.12 BuildingtheDFUDSrepresentation 235 8.13 ComputingthesimpleDFUDSoperationsonordinaltrees 239 8.14 ComputingthecomplexDFUDSoperationsonordinaltrees 240 8.15 ComputingtheadditionalcardinaltreeoperationsonDFUDS 241 8.16 ComputingthelabeledtreeoperationsonLOUDSorDFUDS 244 8.17 EnumeratingthepathfromutovwithLOUDS 247

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.