Massive Parallelism for Combinatorial Problems by Hardware Acceleration with an Application to the Label Switching Problem Edward Steere A dissertation submitted to the Faculty of Engineering and the Built Environment, University oftheWitwatersrand,infulfilmentoftherequirementsforthedegreeofMasterofSciencein Engineering. I declare that this dissertation is my own unaided work. It is being submitted to the degree of Master’s in Electrical engineering at the University of the Witwatersrand, Johannesburg. It has not been submitted before for any degree or examination to any other university. ...................... Edward Steere Date: ...................... Abstract This dissertation proposes an approach to solving hard combinatorial problems in massively parallel architectures using parallel metaheuristics. Combinatorial problems are common in many scientific fields. Scientific progress is con- strained by the fact that, even using state of the art algorithms, solving hard combinatorial problems can take days or weeks. This is the case with the Label Switching Problem (LSP) in the field of Bioinformatics. In this field, prior work to solve the LSP has resulted in the program CLUMPP (CLUster Matching and Permutation Program). CLUMPP focuses solely on the use of a sequential, classical heuristic, and has had success in smaller low complexity problems. By contrast this dissertation proposes the Parallel Solvers model for the acceleration of hard combinatorial problems. This model draws on the commonalities evident in algorithms and strategies in metaheuristics. After investigating the effectiveness of the mechanisms apparent in the Parallel Solvers model with regards to the LSP, the author developed DePermute, an algorithm which can be used to solve the LSP significantly faster. Results were generated from time based testing of simulated data, as well as data freely available on the Internet as part of various projects. An investigation into the effectiveness of DePermute was carried out on a CPU (Central Processing Unit) based computer. The time based testing was carried out on a CPU based computer and on a Graphics Processing Unit (GPU) attached to a CPU host computer. The dissertation also proposes the design of an Field Programmable Gate Arrays (FGPA) based implementation of DePermute. Using Parallel Solvers, in the DePermute algorithm, the time taken for population group sizes, K, ranging from K = 5 to 20 was improved by up to two orders of magnitude using the GPU implementation and aggressive settings for CLUMPP. The CPU implementation, while slower than the GPU implementation still outperforms CLUMPP, using aggressive settings, marginally and usually with better quality. In addition it outperforms CLUMPP by at least an order of magnitude when CLUMPP is set to use higher quality settings. Combinatorial problems can be very difficult. Parallel Solvers has been effective in the field of Bioinformatics in solving the LSP. This dissertation proposes that it might assist in the reasoning and design of algorithms in other fields. i ii Acknowledgements I would like to acknowledge the help and guidance I received from my supervisor, Professor Scott Hazelhurst, whose thorough approach to research challenged me to improve and whose involvement in the research community introduced me to good friends and colleagues. I would also like to thank the postgraduate students of the University of the Witwatersrand’s school of Electrical and Information Engineering for two years spent learning to broaden my understanding of the world through discussion and exploration. I would like to acknowledge the role which my family provided in supporting me during the time I spent writing the dissertation and starting my career. iv Contents Abstract i 1 Introduction 1 2 Background 5 2.1 Combinatorial Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Combinatorial Problem Frameworks . . . . . . . . . . . . . . . . . . . 6 2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Exact Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Approximation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Metaheuristic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Practical Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 Parallel Model Classification . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.2 Computer Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 A Combinatorial Problem in Bioinformatics . . . . . . . . . . . . . . . . . . . 21 2.5.1 Population Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5.2 The Problem Which the Model Based Approach Solves . . . . . . . . 22 2.5.3 The Problem Which the Model Based Approach Creates . . . . . . . . 23 2.5.4 The Label Switching Problem . . . . . . . . . . . . . . . . . . . . . . . 24 3 Parallel Metaheuristics and Solving the LSP 27 3.1 Algorithms and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.1 Parallel Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.2 Computational Bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Practical Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 Parallel Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.2 GPU Based Implementations . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.3 FPGA Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 The Current Approach to Solving the LSP . . . . . . . . . . . . . . . . . . . . 36 3.3.1 A Greedy Algorithm Formulation . . . . . . . . . . . . . . . . . . . . . 36 3.3.2 A Fitness Function for the LSP . . . . . . . . . . . . . . . . . . . . . . 37 v vi CONTENTS 4 The Parallel Solvers Model 39 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1.1 Trajectory-Based Metaheuristics Under a New Light . . . . . . . . . . 39 4.1.2 Population-Based Metaheuristics Under a New Light . . . . . . . . . . 40 4.1.3 A General Metaheuristic Model . . . . . . . . . . . . . . . . . . . . . . 40 4.1.4 Class One Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.5 Class Two Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.6 A Universal Sequential Metaheuristic Model . . . . . . . . . . . . . . . 42 4.1.7 Information Sharing in Parallel Metaheuristics . . . . . . . . . . . . . 43 4.1.8 Parallel Strategies for Metaheuristics . . . . . . . . . . . . . . . . . . . 44 4.1.9 Geometric Division of Problems Solvable by Metaheuristics . . . . . . 45 4.2 Specification of the Parallel Solvers Model . . . . . . . . . . . . . . . . . . . . 45 4.2.1 Details of the Parallel Solvers Model . . . . . . . . . . . . . . . . . . . 46 4.2.2 A Universal Model for Parallel Metaheuristics . . . . . . . . . . . . . . 46 4.3 The DePermute Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Algorithmic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.2 Concrete Algorithmic Description of DePermute . . . . . . . . . . . . 55 4.3.3 Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5 Testing the Convergence Rates of Heuristics 73 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Test Design – Abstract Simulation . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 Generation of Random Data Sets . . . . . . . . . . . . . . . . . . . . . 74 5.2.2 Population Diversification . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.3 Mutation (Trajectory Diversification) . . . . . . . . . . . . . . . . . . 75 5.2.4 Elitism (Population Refinement) . . . . . . . . . . . . . . . . . . . . . 77 5.2.5 Sub Space Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.6 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2.7 Block Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.8 Critical Discussion of Testing Procedure . . . . . . . . . . . . . . . . . 86 6 Practical Testing 87 6.1 CPU Based Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.1.1 Software Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.1.2 Implementation of the DePermute Algorithm . . . . . . . . . . . . . . 88 6.1.3 Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.1.4 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.2 GPU Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2.1 Software Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2.2 Implementation of the DePermute Algorithm . . . . . . . . . . . . . . 98 6.2.3 Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.4 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3 FPGA Hybrid Computer Prototype . . . . . . . . . . . . . . . . . . . . . . . 104 6.3.1 Software Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.3.2 Implementation of the DePermute Algorithm . . . . . . . . . . . . . . 105 6.3.3 Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 CONTENTS vii 6.4 Test Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.5 Time Based Testing Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.5.1 Programs Under Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.5.2 Test Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.5.3 Program Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.5.4 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.5.5 Methods of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.6 Random Data Set Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.7 Practical Data Set Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7 Results 123 7.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.2 System A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2.1 Randomly Generated Data . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.3 System B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.3.1 Randomly Generated Data . . . . . . . . . . . . . . . . . . . . . . . . 133 7.3.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.4 System C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.4.1 Randomly Generated Data . . . . . . . . . . . . . . . . . . . . . . . . 135 7.4.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 8 Conclusion 139 A Extra Quality Results 141 A.1 System B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 A.2 System C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 B Extra Sample Information 149 C Scripts & Test Programs 151 C.1 Random Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 C.2 Convergence Based Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 C.2.1 Shared Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 C.2.2 Test 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 C.2.3 Test 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.3 Time Based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 C.3.1 Random Data Set Testing . . . . . . . . . . . . . . . . . . . . . . . . . 173 C.3.2 Real Data Set Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 D Program Versions 179 E Test Platforms 181 viii CONTENTS F Extra Details of Algorithm Implementations 183 F.1 Bitonic Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 F.2 Parallel Prefix Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Description: