ebook img

125 Problems in Text Algorithms: with Solutions PDF

345 Pages·2021·9.756 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview 125 Problems in Text Algorithms: with Solutions

125ProblemsinTextAlgorithms Stringmatchingisoneoftheoldestalgorithmictechniques,yetstilloneofthemost pervasiveincomputerscience.Thepast20yearshaveseentechnologicalleapsin applicationsasdiverseasinformationretrievalandcompression.Thiscopiously illustratedcollectionofpuzzlesandexercisesinkeyareasoftextalgorithmsand combinatoricsonwordsoffersgraduatestudentsandresearchersapleasantanddirect waytolearnandpracticewithadvancedconcepts. Theproblemsaredrawnfromalargerangeofscientificpublications,bothclassic andnew.Buildingupfromthebasics,thebookgoesontoshowcaseproblemsin combinatoricsonwords(includingFibonacciorThue–Morsewords),pattern matching(includingKnuth–Morris–PrattandBoyer–Moore–likealgorithms),efficient textdatastructures(includingsuffixtreesandsuffixarrays),regularitiesinwords (includingperiodsandruns)andtextcompression(includingHuffman,Lempel–Ziv andBurrows–Wheeler–basedmethods). Maxime CrochemoreisEmeritusProfessoratUniversitéGustaveEiffeland ofKing’sCollegeLondon.HeholdsanhonorarydoctoratefromtheUniversityof Helsinki.Heistheauthorofmorethan200articlesonalgorithmsonstringsandtheir applications,andco-authorofseveralbooksonthesubject. Thierry LecroqisaprofessorintheDepartmentofComputerScienceatthe UniversityofRouenNormandy(France).Heiscurrentlyheadoftheresearchteam InformationProcessinginBiologyandHealthoftheLaboratoryofComputerScience, InformationProcessingandSystem.Hehasbeenoneofthecoordinatorsofthe workinggroupinstringologyoftheFrenchNationalCentreforScientificResearchfor morethan10years. Wojciech RytterisaprofessorattheFacultyofMathematics,Informatics andMechanics,UniversityofWarsaw.Heistheauthorofalargenumberof publicationsonautomata,formallanguages,parallelalgorithmsandalgorithmson texts.Heisaco-authorofseveralbooksonthesesubjects,includingEfficientParallel Algorithms,TextAlgorithmsandAnalysisofAlgorithmsandDataStructures.Heisa memberofAcademiaEuropaea. 125 Problems in Text Algorithms With Solutions MAXIME CROCHEMORE GustaveEiffelUniversity THIERRY LECROQ UniversityofRouenNormandy WOJCIECH RYTTER UniversityofWarsaw UniversityPrintingHouse,CambridgeCB28BS,UnitedKingdom OneLibertyPlaza,20thFloor,NewYork,NY10006,USA 477WilliamstownRoad,PortMelbourne,VIC3207,Australia 314–321,3rdFloor,Plot3,SplendorForum,JasolaDistrictCentre, NewDelhi–110025,India 79AnsonRoad,#06–04/06,Singapore079906 CambridgeUniversityPressispartoftheUniversityofCambridge. ItfurtherstheUniversity’smissionbydisseminatingknowledgeinthepursuitof education,learning,andresearchatthehighestinternationallevelsofexcellence. www.cambridge.org Informationonthistitle:www.cambridge.org/9781108835831 DOI:10.1017/9781108869317 ©MaximeCrochemore,ThierryLecroq,WojciechRytter2021 IllustrationsdesignedbyHélèneCrochemore Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2021 PrintedintheUnitedKingdombyTJBooksLtd,PadstowCornwall AcataloguerecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloging-in-PublicationData Names:Crochemore,Maxime,1947–author.|Lecroq,Thierry,author.| Rytter,Wojciech,author. Title:Onetwentyfiveproblemsintextalgorithms/MaximeCrochemore, ThierryLecroq,WojciechRytter. Othertitles:125problemsintextalgorithms Description:NewYork:CambridgeUniversityPress,2021.| Thenumerals125aresuperimposedover“Onetwentyfive”onthetitlepage.| Includesbibliographicalreferencesandindex. Identifiers:LCCN2021002037(print)|LCCN2021002038(ebook)| ISBN9781108835831(hardback)|ISBN9781108798853(paperback)| ISBN9781108869317(epub) Subjects:LCSH:Textprocessing(Computerscience)–Problems,exercises,etc.| Computeralgorithms–Problems,exercises,etc. Classification:LCCQA76.9.T48C7582021(print)| LCCQA76.9.T48(ebook)|DDC005.13–dc23 LCrecordavailableathttps://lccn.loc.gov/2021002037 LCebookrecordavailableathttps://lccn.loc.gov/2021002038 ISBN978-1-108-83583-1Hardback ISBN978-1-108-79885-3Paperback CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracyof URLsforexternalorthird-partyinternetwebsitesreferredtointhispublication anddoesnotguaranteethatanycontentonsuchwebsitesis,orwillremain, accurateorappropriate. Contents Preface pageix 1 TheVeryBasicsofStringology 1 2 CombinatorialPuzzles 17 1 StringologicProofofFermat’sLittleTheorem 18 2 SimpleCaseofCodicityTesting 19 3 MagicSquaresandtheThue–MorseWord 20 4 Oldenburger–KolakoskiSequence 22 5 Square-FreeGame 24 6 FibonacciWordsandFibonacciNumerationSystem 26 7 Wythoff’sGameandFibonacciWord 28 8 DistinctPeriodicWords 30 9 ARelativeoftheThue–MorseWord 33 10 Thue–MorseWordsandSumsofPowers 34 11 ConjugatesandRotationsofWords 35 12 ConjugatePalindromes 37 13 ManyWordswithManyPalindromes 39 14 ShortSuperwordofPermutations 41 15 ShortSupersequenceofPermutations 43 16 SkolemWords 45 17 LangfordWords 48 18 FromLyndonWordstodeBruijnWords 50 3 PatternMatching 53 19 BorderTable 54 20 ShortestCovers 56 21 ShortBorders 58 v vi Contents 22 PrefixTable 60 23 BorderTabletotheMaximalSuffix 62 24 PeriodicityTest 65 25 StrictBorders 67 26 DelayofSequentialStringMatching 70 27 SparseMatchingAutomaton 72 28 Comparison-EffectiveStringMatching 74 29 StrictBorderTableoftheFibonacciWord 76 30 WordswithSingletonVariables 78 31 Order-PreservingPatterns 81 32 ParameterisedMatching 83 33 Good-SuffixTable 85 34 WorstCaseoftheBoyer–MooreAlgorithm 88 35 Turbo-BMAlgorithm 90 36 StringMatchingwithDon’tCares 92 37 CyclicEquivalence 93 38 SimpleMaximalSuffixComputation 96 39 Self-MaximalWords 98 40 MaximalSuffixandItsPeriod 100 41 CriticalPositionofaWord 103 42 PeriodsofLyndonWordPrefixes 105 43 SearchingZiminWords 107 44 SearchingIrregular2DPatterns 110 4 EfficientDataStructures 111 45 ListAlgorithmforShortestCover 112 46 ComputingLongestCommonPrefixes 113 47 SuffixArraytoSuffixTree 115 48 LinearSuffixTrie 119 49 TernarySearchTrie 122 50 LongestCommonFactorofTwoWords 124 51 SubsequenceAutomaton 126 52 CodicityTest 128 53 LPFTable 130 54 SortingSuffixesofThue–MorseWords 134 55 BareSuffixTree 137 56 ComparingSuffixesofaFibonacciWord 139 57 AvoidabilityofBinaryWords 141 58 AvoidingaSetofWords 144 vii 59 MinimalUniqueFactors 146 60 MinimalAbsentWords 148 61 GreedySuperstring 152 62 ShortestCommonSuperstringofShortWords 155 63 CountingFactorsbyLength 157 64 CountingFactorsCoveringaPosition 160 65 LongestCommon-ParityFactors 161 66 WordSquare-FreenesswithDBF 162 67 GenericWordsofFactorEquations 164 68 SearchinganInfiniteWord 166 69 PerfectWords 169 70 DenseBinaryWords 173 71 FactorOracle 175 5 RegularitiesinWords 180 72 ThreeSquarePrefixes 181 73 TightBoundsonOccurrencesofPowers 183 74 ComputingRunsonGeneralAlphabets 185 75 TestingOverlapsinaBinaryWord 188 76 Overlap-FreeGame 190 77 AnchoredSquares 192 78 AlmostSquare-FreeWords 195 79 BinaryWordswithFewSquares 197 80 BuildingLongSquare-FreeWords 199 81 TestingMorphismSquare-Freeness 201 82 NumberofSquareFactorsinLabelledTrees 203 83 CountingSquaresinCombsinLinearTime 206 84 CubicRuns 208 85 ShortSquareandLocalPeriod 210 86 TheNumberofRuns 212 87 ComputingRunsonSortedAlphabet 214 88 PeriodicityandFactorComplexity 219 89 PeriodicityofMorphicWords 220 90 SimpleAnti-powers 222 91 PalindromicConcatenationofPalindromes 224 92 PalindromeTrees 225 93 UnavoidablePatterns 227 viii Contents 6 TextCompression 230 94 BWTransformofThue–MorseWords 231 95 BWTransformofBalancedWords 233 96 In-placeBWTransform 237 97 Lempel–ZivFactorisation 239 98 Lempel–Ziv–WelchDecoding 242 99 CostofaHuffmanCode 244 100 Length-LimitedHuffmanCoding 248 101 OnlineHuffmanCoding 253 102 Run-LengthEncoding 256 103 ACompactFactorAutomaton 261 104 CompressedMatchinginaFibonacciWord 264 105 PredictionbyPartialMatching 266 106 CompressingSuffixArrays 269 107 CompressionRatioofGreedySuperstrings 271 7 Miscellaneous 275 108 BinaryPascalWords 276 109 Self-ReproducingWords 278 110 WeightsofFactors 280 111 Letter-OccurrenceDifferences 282 112 FactoringwithBorder-FreePrefixes 283 113 PrimitivityTestforUnaryExtensions 286 114 PartiallyCommutativeAlphabets 288 115 GreatestFixed-DensityNecklace 290 116 Period-EquivalentBinaryWords 292 117 OnlineGenerationofdeBruijnWords 295 118 RecursiveGenerationofdeBruijnWords 298 119 WordEquationswithGivenLengthsofVariables 300 120 DiverseFactorsoveraThree-LetterAlphabet 302 121 LongestIncreasingSubsequence 304 122 UnavoidableSetsviaLyndonWords 306 123 SynchronisingWords 309 124 Safe-OpeningWords 311 125 SuperwordsofShortenedPermutations 314 Bibliography 318 Index 332 Preface This book is about algorithms on texts, also called algorithmic stringology. Text (word, string, sequence) is one of the main unstructured data types and thesubjectisofvitalimportanceincomputerscience. Thesubjectisversatilebecauseitisabasicrequirementinmanysciences, especiallyincomputerscienceandengineering.Thetreatmentofunstructured data is a very lively area and demands efficient methods owing both to their presenceinhighlyrepetitiveinstructionsofoperatingsystemsandtothevast amountofdatathatneedstobeanalysedondigitalnetworksandequipments. Thelatterisclearforinformationtechnologycompaniesthatmanagemassive data in their data centres but also holds for most scientific areas beyond Computerscience. The book presents a collection of the most interesting representative problems in stringology. They are introduced in a short and pleasant way andopendoorstomoreadvancedtopics.Theywereextractedfromhundreds of serious scientific publications, some of which are more than a hundred years old and some are very fresh and up to date. Most of the problems are relatedtoapplicationswhileothersaremoreabstract.Thecorepartofmostof themisaningeniousshortalgorithmicsolutionexceptforafewintroductory combinatorialproblems. This is not just yet another monograph on the subject but a series of problems(puzzlesandexercises).Itisacomplementtobooksdedicatedtothe subjectinwhichtopicsareintroducedinamoreacademicandcomprehensive way.Nevertheless,mostconceptsinthefieldareincludedinthebook,which fillsamissinggapandisveryexpectedandneeded,especiallyforstudentsand teachers,asthefirstproblem-solvingtextbookofthedomain. ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.