ebook img

The secret of getting ahead is getting started. PDF

1585 Pages·2014·10.66 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The secret of getting ahead is getting started.

“The secret of getting ahead is getting started.” — Mark Twain A Study of Ordered Gene Problems Featuring DNA Error Correction and DNA Fragment Assembly with a Variety of Heuristics, Genetic Algorithm Variations, and Dynamic Representations James Alexander Hughes Department of Computer Science Supervisor: Dr. Sheridan Houghten Collaborators: Daniel Ashlock (University of Guelph) Guillermo M. Mall´en-Fullerton (Universidad Iberoamericana) Joseph Alexander Brown (Magna International) Submitted in partial fulfilment of the requirements for the degree of Master of Science Faculty of Mathematics and Science, Brock University St. Catharines, Ontario (cid:13)c James Alexander Hughes 2014 Abstract Ordered gene problems are a very common classification of optimization problems. Because of their popularity countless algorithms have been developed in an attempt to find high quality solutions to the problems. It is also common to see many different types of problems reduced to ordered gene style problems as there are many popular heuristics and metaheuristics for them due to their popularity. Multiple ordered gene problems are studied, namely, the travelling salesman prob- lem, binpackingproblem, andgraphcolouringproblem. Inaddition, twobioinformat- ics problems not traditionally seen as ordered gene problems are studied: DNA error correction and DNA fragment assembly. These problems are studied with multiple variations and combinations of heuristics and metaheuristics with two distinct types or representations. The majority of the algorithms are built around the Recentering- Restarting Genetic Algorithm. The algorithm variations were successful on all problems studied, and particularly for the two bioinformatics problems. For DNA Error Correction multiple cases were found with 100% of the codes being corrected. The algorithm variations were also able to beat all other state-of-the-art DNA Fragment Assemblers on 13 out of 16 benchmark problem instances. Acknowledgements Firstly I would like to thank my supervisor, Dr. Sheridan Houghten for her constant encouragement and support. Although many times I would curse her high expec- tations I know that it was because of this I was able to achieve great things. Dr. Houghten has given me many opportunities I would otherwise not have; she really is a SUPERvisor. I would also like to thank Dr. Daniel Ashlock (University of Guelph) and sen˜or GuillermoM.Mall´en-Fullerton(UniversidadIberoamericana)fortheirsignificanthelp on this work. Without them this work would not have been nearly as successful. I would like to acknowledge the advisory committee for the time and patience they have put towards my research. A special thanks goes to the Brock University Computer Science Department. Everyone within the department has given me support throughout my undergraduate and graduate studies. I know that they were a very integral part of my education and without them this work would have been impossible. I would particularly like to thank Cale Fairchild for his continuous help and support. No amount of thanks would compensate for the unbelievable amount of time he has given me. Thank you to NSERC as this work was partially funded by the Natural Sciences and Engineering Research Council of Canada. Lastly I would like to thank my friends and family, especially Matea Drljepan. They were there for me well beyond my formal education. Without their ludicrous support, patience, tolerance, time, and acceptance I would not be where I am today. Thank you, J. A. H. Contents 1 Introduction 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Problem Descriptions 5 2.1 Travelling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Small City Problems . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Bin Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 DNA Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.5 DNA Fragment Assembly Problem . . . . . . . . . . . . . . . . . . . 9 3 Literature Review 14 3.1 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Nearest Neighbour . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 Minimal Spanning Tree . . . . . . . . . . . . . . . . . . . . . . 14 3.1.3 2-Opt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.4 Lin-Kernighan . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Genetic Algorithm Variations . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Island Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.2 Ring Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Recentering-Restarting Genetic Algorithm . . . . . . . . . . . . . . . 17 3.4 Travelling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 Bin Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5.1 A Branch-and-Cut-and-Price Algorithm for One-Dimensional Stock Cutting and Two-Dimensional Two- Stage Cutting . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.5.2 A Hybrid Grouping Genetic Algorithm for Bin Packing . . . . 19 iii 3.6 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6.2 Graph Coloring with Adaptive Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6.3 A New Genetic Graph Coloring Heuristic . . . . . . . . . . . . 21 3.7 DNA Error Correction with Side Effect Machines . . . . . . . . . . . 21 3.7.1 Edit Metric Decoding: A New Hope . . . . . . . . . . . . . . . 22 3.7.2 Decoding Algorithms Using Side-Effect Machines . . . . . . . 22 3.8 DNA Fragment Assembly Problem . . . . . . . . . . . . . . . . . . . 22 3.8.1 DNA Sequence Assembly and Genetic Algorithms, New Results and Puzzling Insights . . . . . . . . . . . . . . . 23 3.8.2 A Genetic Algorithm Approach to Solving DNA Fragment As- sembly Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.8.3 Metaheuristics for the DNA fragment assembly problem . . . 24 3.8.4 A Hybrid Genetic Algorithm for the DNA Fragment Assembly Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.8.5 DNA Fragment Assembly Using a Grid-Based Genetic Algorithm 26 3.8.6 SAX: a new and efficient assembler for solving DNA Fragment Assembly Problem . . . . . . . . . . . . . . . . . . . . . . . . 26 3.8.7 An efficient genome fragment assembling using GA with neighbourhood aware fitness function . . . . . . . . . 28 3.8.8 SolvingtheDNAFragmentAssembly ProblemEfficiently Using Iterative Optimization with Evolved Hypermutations . . . . . 28 3.8.9 Bee Algorithms for Solving DNA Fragment Assembly Problem With Noisy and Noiseless Data . . . . . . . . . . . . . . . . . 29 3.8.10 DNA Fragment Assembly Using Optimization from Nature In- spired Algorithms to Formal Methods . . . . . . . . . . . . . . 30 3.8.11 De Bruijn Graph Approaches . . . . . . . . . . . . . . . . . . 32 4 Algorithms 33 4.1 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.1 Nearest Neighbour . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.2 Minimal Spanning Tree . . . . . . . . . . . . . . . . . . . . . . 33 4.1.3 2-Opt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.4 Lin-Kernighan . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.1 Genetic Algorithm Definitions . . . . . . . . . . . . . . . . . . 36 4.3 Genetic Algorithm Variations . . . . . . . . . . . . . . . . . . . . . . 39 4.3.1 Ring Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3.2 Island Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.3 Recentering-Restarting Genetic Algorithm . . . . . . . . . . . 41 4.4 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4.1 Direct Representation . . . . . . . . . . . . . . . . . . . . . . 47 4.4.2 Indirect Transposition Representation . . . . . . . . . . . . . . 47 4.5 Genetic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.5.1 Crossovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.5.2 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.6 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5 Experimental Design 54 5.1 Travelling Salesman Problem: Small Problem Instances . . . . . . . . 54 5.1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1.2 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1.3 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1.4 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.2 Bin Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.2 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.3 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.4 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.3 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3.2 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3.3 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3.4 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.4 Travelling Salesman Problem: Large Problem Instances . . . . . . . . 64 5.4.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.4.2 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.4.3 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.4.4 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . 65 5.4.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.5 DNA Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.5.1 Side Effect Machines . . . . . . . . . . . . . . . . . . . . . . . 67 5.5.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.5.3 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.5.4 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.5.5 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . 72 5.5.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.6 DNA Fragment Assembly . . . . . . . . . . . . . . . . . . . . . . . . 74 5.6.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.6.2 Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.6.3 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.6.4 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . 79 5.6.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6 Analysis and Discussion 82 6.1 Travelling Salesman Problem: Small Problem Instances . . . . . . . . 82 6.1.1 Increased Number of Generations . . . . . . . . . . . . . . . . 84 6.2 Bin Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.3 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . 86 6.4 Travelling Salesman Problem: Large Problem Instances . . . . . . . . 89 6.5 DNA Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.5.1 First Set of Experiments . . . . . . . . . . . . . . . . . . . . . 93 6.5.2 Second Set of Experiments . . . . . . . . . . . . . . . . . . . . 95 6.6 DNA Fragment Assembly . . . . . . . . . . . . . . . . . . . . . . . . 96 6.6.1 First Set of Results . . . . . . . . . . . . . . . . . . . . . . . . 97 6.6.2 Second Set of Results . . . . . . . . . . . . . . . . . . . . . . . 103 6.6.3 Third Set of Results . . . . . . . . . . . . . . . . . . . . . . . 108 6.6.4 Fourth Set of Results . . . . . . . . . . . . . . . . . . . . . . . 114 7 Conclusions and Future Work 120 7.1 Travelling Salesman Problem: Small Problem Instances . . . . . . . . 123 7.2 Bin Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.3 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . 125 7.4 Travelling Salesman Problem: Large Problem Instances . . . . . . . . 126 7.5 DNA Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.6 DNA Fragment Assembly . . . . . . . . . . . . . . . . . . . . . . . . 128 7.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Bibliography 133 Appendices 143 A Tables 143 A.1 Travelling Salesman Problem: Small Problem Instances . . . . . . . . 144 A.1.1 Increased Number of Generations . . . . . . . . . . . . . . . . 154 A.2 Bin Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 A.2.1 u Instance Results . . . . . . . . . . . . . . . . . . . . . . . . 154 A.2.2 hard28 Instance Results . . . . . . . . . . . . . . . . . . . . . 175 A.3 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . 204 A.4 Travelling Salesman Problem: Large Problem Instances . . . . . . . . 214 A.4.1 Results With No Post Optimization . . . . . . . . . . . . . . . 214 A.4.2 Results With Post Optimization . . . . . . . . . . . . . . . . . 247 A.5 DNA Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . 280 A.5.1 Code 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 A.5.2 Code 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 A.5.3 Code 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 A.6 DNA Fragment Assembly . . . . . . . . . . . . . . . . . . . . . . . . 304 A.6.1 First Set of Results . . . . . . . . . . . . . . . . . . . . . . . . 304 A.6.2 Second Set of Results . . . . . . . . . . . . . . . . . . . . . . . 426 A.6.3 Third Set of Results . . . . . . . . . . . . . . . . . . . . . . . 675 A.6.4 Fourth Set of Results . . . . . . . . . . . . . . . . . . . . . . . 919 B Graphs 1163 B.1 Bin Packing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164 B.1.1 u Instance Results . . . . . . . . . . . . . . . . . . . . . . . . 1164 B.1.2 hard28 Instance Results . . . . . . . . . . . . . . . . . . . . . 1175 B.2 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . 1190 B.3 Travelling Salesman Problem: Large Problem Instances . . . . . . . . 1200 B.3.1 No Post Optimization . . . . . . . . . . . . . . . . . . . . . . 1200 B.3.2 Post Optimization . . . . . . . . . . . . . . . . . . . . . . . . 1209 B.3.3 Comparison Between No Post Optimization Results and Post Optimization Results . . . . . . . . . . . . . . . . . . . . . . . 1218 B.4 DNA Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . 1227 B.4.1 Code 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227 B.4.2 Code 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1229 B.4.3 Code 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1231 B.5 DNA Fragment Assembly . . . . . . . . . . . . . . . . . . . . . . . . 1233 B.5.1 First Set of Results . . . . . . . . . . . . . . . . . . . . . . . . 1233 B.5.2 Second Set of Results . . . . . . . . . . . . . . . . . . . . . . . 1281 B.5.3 Third Set of Results . . . . . . . . . . . . . . . . . . . . . . . 1379 B.5.4 Fourth Set of Results . . . . . . . . . . . . . . . . . . . . . . . 1475

Description:
“The secret of getting ahead is getting started.” — Mark Twain able to beat all other state-of-the-art DNA Fragment Assemblers on 13 out of 16 benchmark problem instances. optimally work through a search space. Sections of this thesis are heavily influenced by recently published works, spec
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.