A ProgrAmmer’s ComPAnion to Algorithm AnAlysis © 2007 by Taylor & Francis Group, LLC C6730_C000a.indd 1 08/14/2006 3:53:08 PM A ProgrAmmer’s ComPAnion to Algorithm AnAlysis ernst l. leiss University of Houston, Texas, U.S.A. © 2007 by Taylor & Francis Group, LLC C6730_C000a.indd 3 08/14/2006 3:53:08 PM Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2007 by Taylor & Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-10: 1-58488-673-0 (Softcover) International Standard Book Number-13: 978-1-58488-673-0 (Softcover) This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the conse- quences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www. copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Leiss, Ernst L., 1952- A programmer’s companion to algorithm analysis / Ernst L. Leiss. p. cm. Includes bibliographical references and index. ISBN 1-58488-673-0 (acid-free paper) 1. Programming (Mathematics) 2. Algorithms--Data processing. I. Title. QA402.5.L398 2006 005.1--dc22 2006044552 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com © 2007 by Taylor & Francis Group, LLC TC&6F73_0L_OCC0_0A0_aM.inadsdte r .4indd 1 08/164/1/240/0066 83::5563::3068 APMM C6730_C000.fm Page v Monday, July 3, 2006 2:30 PM Preface The primary emphasis of this book is the transition from an algorithm to a program. Given a problem to solve, the typical first step is the design of an algorithm; this algorithm is then translated into software. We will look care- fully at the interface between the design and analysis of algorithms on the one hand and the resulting program solving the problem on the other. This approach is motivated by the fact that algorithms for standard problems are readily available in textbooks and literature and are frequently used as building blocks for more complex designs. Thus, the correctness of the algo- rithm is much less a concern than its adaptation to a working program. Many textbooks, several excellent, are dedicated to algorithms, their design, their analysis, the techniques involved in creating them, and how to determine their time and space complexities. They provide the building blocks of the overall design. These books are usually considered part of the theoretical side of computing. There are also numerous books dedicated to designing software, from those concentrating on programming in the small (designing and debugging individual programs) to programming in the large (looking at large systems in their totality). These books are usually viewed as belonging to software engineering. However, there are no books that look systematically at the gap separating the theory of algorithms and software engineering, even though many things can go wrong in taking several algorithms and producing a software product derived from them. This book is intended to fill this gap. It is not intended to teach algorithms from scratch; indeed, I assume the reader has already been exposed to the ordinary machinery of algorithm design, including the standard algorithms for sorting and searching and techniques for analyzing the correctness and complexity of algorithms (although the most important ones will be reviewed). Nor is this book meant to teach software design; I assume that the reader has already gained experience in designing reasonably complex software systems. Ideally, the readers’ interest in this book’s topic was prompted by the uncomfortable realization that the path from algorithm to software was much more arduous than anticipated, and, indeed, results obtained on the theory side of the development process, be they results derived by readers or acquired from textbooks, did not translate satisfac- torily to corresponding results, that is, performance, for the developed software. Even if the reader has never encountered a situation where the performance predicted by the complexity analysis of a specific algorithm did not correspond to the performance observed by running the resulting software, I argue that such occurrences are increasingly more likely, given © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page vi Monday, July 3, 2006 2:30 PM the overall development of our emerging hardware platforms and software environments. In many cases, the problems I will address are rooted in the different way memory is viewed. For the designer of an algorithm, memory is inexhaust- ible, has uniform access properties, and generally behaves nicely (I will be more specific later about the meaning of niceness). Programmers, however, have to deal with memory hierarchies, limits on the availability of each class of memory, and the distinct nonuniformity of access characteristics, all of which imply a definite absence of niceness. Additionally, algorithm designers assume to have complete control over their memory, while software design- ers must deal with several agents that are placed between them and the actual memory — to mention the most important ones, compilers and oper- ating systems, each of which has its own idiosyncrasies. All of these conspire against the software designer who has the naïve and often seriously disap- pointed expectation that properties of algorithms easily translate into prop- erties of programs. The book is intended for software developers with some exposure to the design and analysis of algorithms and data structures. The emphasis is clearly on practical issues, but the book is naturally dependent on some knowledge of standard algorithms — hence the notion that it is a companion book. It can be used either in conjunction with a standard algorithm text, in which case it would most likely be within the context of a course setting, or it can be used for independent study, presumably by practitioners of the software development process who have suffered disappointments in apply- ing the theory of algorithms to the production of efficient software. © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page vii Monday, July 3, 2006 2:30 PM Contents Foreword..............................................................................................................xiii Part 1 The Algorithm Side: Regularity, Predictability, and Asymptotics 1 A Taxonomy of Algorithmic Complexity.....................................3 1.1 Introduction....................................................................................................3 1.2 The Time and Space Complexities of an Algorithm................................5 1.3 The Worst-, Average-, and Best-Case Complexities of an Algorithm...9 1.3.1 Scenario 1..........................................................................................11 1.3.2 Scenario 2..........................................................................................12 1.4 Bit versus Word Complexity......................................................................12 1.5 Parallel Complexity.....................................................................................15 1.6 I/O Complexity...........................................................................................17 1.6.1 Scenario 1..........................................................................................18 1.6.2 Scenario 2..........................................................................................20 1.7 On-Line versus Off-Line Algorithms.......................................................22 1.8 Amortized Analysis.....................................................................................24 1.9 Lower Bounds and Their Significance.....................................................24 1.10 Conclusion....................................................................................................30 Bibliographical Notes...........................................................................................30 Exercises.................................................................................................................31 2 Fundamental Assumptions Underlying Algorithmic Complexity...............................................................37 2.1 Introduction..................................................................................................37 2.2 Assumptions Inherent in the Determination of Statement Counts.........................................................................................38 2.3 All Mathematical Identities Hold.............................................................44 2.4 Revisiting the Asymptotic Nature of Complexity Analysis.................45 2.5 Conclusion....................................................................................................46 Bibliographical Notes...........................................................................................47 Exercises.................................................................................................................47 © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page viii Monday, July 3, 2006 2:30 PM 3 Examples of Complexity Analysis..............................................49 3.1 General Techniques for Determining Complexity.................................49 3.2 Selected Examples: Determining the Complexity of Standard Algorithms...................................................................................53 3.2.1 Multiplying Two m-Bit Numbers....................................................54 3.2.2 Multiplying Two Square Matrices................................................55 3.2.3 Optimally Sequencing Matrix Multiplications...........................57 3.2.4 MergeSort.........................................................................................59 3.2.5 QuickSort..........................................................................................60 3.2.6 HeapSort...........................................................................................62 3.2.7 RadixSort..........................................................................................65 3.2.8 Binary Search...................................................................................67 3.2.9 Finding the Kth Largest Element.................................................68 3.2.10 Search Trees......................................................................................71 3.2.10.1 Finding an Element in a Search Tree.............................72 3.2.10.2 Inserting an Element into a Search Tree.......................73 3.2.10.3 Deleting an Element from a Search Tree......................74 3.2.10.4 Traversing a Search Tree..................................................76 3.2.11 AVL Trees..........................................................................................76 3.2.11.1 Finding an Element in an AVL Tree..............................76 3.2.11.2 Inserting an Element into an AVL Tree.........................77 3.2.11.3 Deleting an Element from an AVL Tree........................83 3.2.12 Hashing.............................................................................................84 3.2.13 Graph Algorithms...........................................................................87 3.2.13.1 Depth-First Search............................................................88 3.2.13.2 Breadth-First Search.........................................................89 3.2.13.3 Dijkstra’s Algorithm.........................................................91 3.3 Conclusion....................................................................................................92 Bibliographical Notes...........................................................................................92 Exercises.................................................................................................................93 Part 2 The Software Side: Disappointments and How to Avoid Them 4 Sources of Disappointments......................................................103 4.1 Incorrect Software......................................................................................103 4.2 Performance Discrepancies......................................................................105 4.3 Unpredictability.........................................................................................109 4.4 Infeasibility and Impossibility.................................................................111 4.5 Conclusion..................................................................................................113 Bibliographical Notes.........................................................................................114 Exercises...............................................................................................................115 © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page ix Monday, July 3, 2006 2:30 PM 5 Implications of Nonuniform Memory for Software...............117 5.1 The Influence of Virtual Memory Management...................................118 5.2 The Case of Caches...................................................................................123 5.3 Testing and Profiling.................................................................................124 5.4 What to Do about It..................................................................................125 Bibliographical Notes.........................................................................................136 Exercises...............................................................................................................137 6 Implications of Compiler and Systems Issues for Software.................................................................................141 6.1 Introduction................................................................................................141 6.2 Recursion and Space Complexity...........................................................142 6.3 Dynamic Structures and Garbage Collection........................................145 6.4 Parameter-Passing Mechanisms..............................................................150 6.5 Memory Mappings....................................................................................155 6.6 The Influence of Language Properties...................................................155 6.6.1 Initialization...................................................................................155 6.6.2 Packed Data Structures................................................................157 6.6.3 Overspecification of Execution Order.......................................158 6.6.4 Avoiding Range Checks...............................................................159 6.7 The Influence of Optimization................................................................160 6.7.1 Interference with Specific Statements........................................160 6.7.2 Lazy Evaluation.............................................................................161 6.8 Parallel Processes.......................................................................................162 6.9 What to Do about It..................................................................................163 Bibliographical Notes.........................................................................................164 Exercises...............................................................................................................164 7 Implicit Assumptions.................................................................167 7.1 Handling Exceptional Situations.............................................................167 7.1.1 Exception Handling......................................................................168 7.1.2 Initializing Function Calls...........................................................169 7.2 Testing for Fundamental Requirements.................................................171 7.3 What to Do about It..................................................................................174 Bibliographical Notes.........................................................................................174 Exercises...............................................................................................................175 8 Implications of the Finiteness of the Representation of Numbers..................................................................................177 8.1 Bit and Word Complexity Revisited.......................................................177 8.2 Testing for Equality...................................................................................180 8.3 Mathematical Properties...........................................................................183 8.4 Convergence...............................................................................................185 8.5 What to Do about It..................................................................................186 © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page x Monday, July 3, 2006 2:30 PM Bibliographical Notes.........................................................................................186 Exercises...............................................................................................................187 9 Asymptotic Complexities and the Selection of Algorithms..............................................................................189 9.1 Introduction................................................................................................189 9.2 The Importance of Hidden Constants....................................................190 9.3 Crossover Points........................................................................................193 9.4 Practical Considerations for Efficient Software: What Matters and What Does Not.........................................................196 Bibliographical Notes.........................................................................................197 Exercises...............................................................................................................198 10 Infeasibility and Undecidability: Implications for Software Development...............................................................199 10.1 Introduction................................................................................................199 10.2 Undecidability............................................................................................201 10.3 Infeasibility.................................................................................................203 10.4 NP-Completeness......................................................................................207 10.5 Practical Considerations...........................................................................208 Bibliographical Notes.........................................................................................209 Exercises...............................................................................................................210 Part 3 Conclusion Appendix I: Algorithms Every Programmer Should Know.....................217 Bibliographical Notes.........................................................................................223 Appendix II: Overview of Systems Implicated in Program Analysis...225 II.1 Introduction................................................................................................225 II.2 The Memory Hierarchy............................................................................225 II.3 Virtual Memory Management.................................................................227 II.4 Optimizing Compilers..............................................................................228 II.4.1 Basic Optimizations......................................................................229 II.4.2 Data Flow Analysis.......................................................................229 II.4.3 Interprocedural Optimizations...................................................230 II.4.4 Data Dependence Analysis..........................................................230 II.4.5 Code Transformations..................................................................231 II.4.6 I/O Issues.......................................................................................231 II.5 Garbage Collection....................................................................................232 Bibliographical Notes.........................................................................................234 © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page xi Monday, July 3, 2006 2:30 PM Appendix III: NP-Completeness and Higher Complexity Classes.........237 III.1 Introduction................................................................................................237 III.2 NP-Completeness......................................................................................237 III.3 Higher Complexity Classes......................................................................240 Bibliographical Notes.........................................................................................241 Appendix IV: Review of Undecidability......................................................243 IV.1 Introduction................................................................................................243 IV.2 The Halting Problem for Turing Machines...........................................243 IV.3 Post’s Correspondence Problem..............................................................245 Bibliographical Note...........................................................................................246 Bibliography.......................................................................................................247 © 2007 by Taylor & Francis Group, LLC