Short-Read DNA Sequence Alignment with Custom Designed FPGA-based Hardware by Adam Hall B.A., The University of Cambridge, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Bioinformatics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) November 2010 c Adam Hall, 2010 (cid:13) Abstract The alignment of short DNA read sequencing data to a human reference genome sequence has become a standard step in the analysis pipeline for short DNA read sequence data. As the rate at which short read DNA sequence data is being produced doubles every 5 months, analysis of this data in a computationally efficient way is becoming increasingly important. We demonstrate how we can exploit the “embarrassingly parallel” property of short read sequence alignment in custom-designed hardware in FPGAs. Hardware is chosen, a system is designed, and this system is implemented. My FPGA-based hit finder was demonstrated to produce correct hit results. The performance of this single FPGA implementation was demonstrated to be 71,000 seed hits found per hour on a human genome sized reference sequence. The implementation was demonstrated to produce identical results to the hit finder stage of the MAQ aligner. We demonstrate that the price/performance of this sliding-window FPGA aligner ( 355 seeds/hr/$) compares favorably to the price/performance of ∼ sliding-window software aligners ( 67.5 seeds/hr/$ for MAQ). However, ∼ software aligners which are based on the superior Burrows-Wheeler alignment algorithm still have a significant price/performance advantage over the FPGA-based approach ( 7,200 seeds/hr/$). We predict that as chips continue ∼ to increase in size due to Moores Law and computation is performed in high-density cloud-computing datacenters the FPGA-based approach will become preferable to current software aligners. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Illumina Short-Read DNA Sequencing . . . . . . . . . . . 2 1.2.2 The Short-Read Alignment Problem . . . . . . . . . . . . 3 1.2.3 Field Programmable Gate Arrays (FPGA’s) . . . . . . . . 6 1.2.4 Programming FPGA’s . . . . . . . . . . . . . . . . . . . . 7 1.2.5 Instantiating a Soft-Core Processor in an FPGA . . . . . . 10 1.2.6 Adoption of the “Cloud Computing” Model in Bioinfor- matics(Stein)(Baker) . . . . . . . . . . . . . . . . . . . . . 11 1.2.7 How BLAST (Basic Local Alignment Search Tool) and Other Related Algorithms Work . . . . . . . . . . . . . . . 17 1.3 Software Aligners . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.1 The Indexing/Hit Finding/Hit Extension Paradigm . . . . 19 1.3.2 Error Models . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.3 ELAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.4 MAQ (Mapping and Assembly with Qualities)(Li, Ruan, and Durbin) . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.3.5 SOAP(Li) . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.3.6 PASS(Campagna, Albiero, Bilardi, Caniato, Forcato, Man- avski, Vitulo, and Valle) . . . . . . . . . . . . . . . . . . . 35 1.3.7 SeqMap(Jiang and Wong) . . . . . . . . . . . . . . . . . . 36 1.3.8 Slider(Malhis, Butterfield, Ester, and Jones) . . . . . . . . 36 1.3.9 Bowtie(Li and Durbin) . . . . . . . . . . . . . . . . . . . . 36 iii 1.4 Other Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.4.1 Dynamic Programming in FPGA’s . . . . . . . . . . . . . 37 1.4.2 Other Previous Uses of FPGA’s in Bioinformatics . . . . . 37 1.4.3 A Previous Implementation of a Short Read Aligner in FPGA Hardware(McMahon) . . . . . . . . . . . . . . . . . 38 2 Overall System Architecure . . . . . . . . . . . . . . . . . . . . . 39 2.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 First Implementation Attempt: Using the Cray XD1 FPGA- Accelerated Computer . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3 Second Implementation Attempt: Development of a PCI-Express based Accelerator Card for the Host Workstation . . . . . . . . . 46 2.4 Third, Final Implementation Attempt: Development of an Ethernet-based Appliance . . . . . . . . . . . . . . . . . . . . . . 47 2.5 Choice of Development Tools . . . . . . . . . . . . . . . . . . . . 48 2.6 Development Hardware Setup . . . . . . . . . . . . . . . . . . . . 49 2.7 Design Decision: Where to Store the Reference Sequence . . . . . 50 2.8 Design Decision: Which Devices are Used on the DE2-70 Board and What Happens to the Rest . . . . . . . . . . . . . . . . . . . 51 2.9 Adapting the The Basic Idea to Short Read Alignment with the DE2-70 Board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.10 Design Decision: Method of Getting Reference Sequence Data into the Query Generator Sliding Window . . . . . . . . . . . . . . . . 59 2.10.1 Method 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.10.2 Method 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.10.3 Method 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.11 System Components . . . . . . . . . . . . . . . . . . . . . . . . . 61 3 SOPC Controller Development . . . . . . . . . . . . . . . . . . . 63 3.1 What the SOPC Controller Does . . . . . . . . . . . . . . . . . . 63 3.2 Implementation Method . . . . . . . . . . . . . . . . . . . . . . . 64 3.3 The SOPC Builder . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4 Design Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5 Design Version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Combining the two 32MB SDRAM Chips into a Single 64MB Memory Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.7 Clock Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.7.1 Method Used to Generate the Clocks . . . . . . . . . . . . 76 3.8 Text FIFO Writer Custom SOPC component . . . . . . . . . . . . 78 3.9 The Seed Register Writer Custom SOPC Component . . . . . . . 79 3.10 The Seed Register Enable Bit Writer Custom SOPC Component . 80 3.11 The Control Flag Generator Module . . . . . . . . . . . . . . . . 80 3.12 The Parameter Readout Module . . . . . . . . . . . . . . . . . . . 83 iv 3.13 The Multi-Match Hit Vector Readout Module . . . . . . . . . . . 85 3.14 Getting Data From the Results FIFO into the SOPC . . . . . . . 85 4 Alignment Pipeline Development . . . . . . . . . . . . . . . . . . 88 4.1 Overall Pipeline Design . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 Design Decision: Allowing Multiple Identical Seeds in a Batch . . 91 4.3 Detailed Pipeline Design . . . . . . . . . . . . . . . . . . . . . . . 93 4.4 The Text FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5 The Query Generator . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.6 The Seed Comparison Module . . . . . . . . . . . . . . . . . . . . 97 4.7 The Priority Encoder . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.8 The Results FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.9 Computing the Stall Signal . . . . . . . . . . . . . . . . . . . . . . 103 4.10 Alignment Pipeline Global Reset . . . . . . . . . . . . . . . . . . 104 4.11 Increasing Clock Frequency with Pipelining . . . . . . . . . . . . . 105 5 Embedded Software Development . . . . . . . . . . . . . . . . . 108 5.1 What the Embedded Software Does . . . . . . . . . . . . . . . . . 108 5.2 Overall Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.3 States and State Transitions . . . . . . . . . . . . . . . . . . . . . 110 5.4 The Main Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.2 Loop Contents . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.3 Shut Down . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.5 The Bit Manipulation Functions . . . . . . . . . . . . . . . . . . . 113 5.5.1 Other Support Functions Written . . . . . . . . . . . . . . 114 5.6 Retrieving Configuration Parameters from the Alignment Appli- ance Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.7 Initialization of Seed Registers and Comparison Module Enable Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.8 Design Decision: How the Reference Genome is Contained in the SDRAM Text Buffer . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.9 Other functions performed by the Embedded Software . . . . . . 116 5.10 Developement of Driver Software for the DM9000A Ethernet In- terface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.10.1 Ethernet Interface Configuration . . . . . . . . . . . . . . 118 5.10.2 Method to Read from the DM9000A Ethernet Frame Re- ceive Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.10.3 Method to Send an Ethernet Frame with the DM9000A Ethernet Interface Controller . . . . . . . . . . . . . . . . 120 6 Development of Appliance Control Application for the Work- station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 v 6.1 What the Workstation Software Does . . . . . . . . . . . . . . . . 123 6.2 Choice of Library for Sending and Receiving Ethernet Frames . . 124 6.3 Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.4 Type of Packets Used . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.5 Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.6 Receiving Ethernet Frames . . . . . . . . . . . . . . . . . . . . . . 125 6.7 Parsing the Command Line Arguments . . . . . . . . . . . . . . . 126 6.8 Writing Unsigned Integer Classes for Java . . . . . . . . . . . . . 127 6.9 Retrieving Configuration Parameters from the Alignment Appliance130 6.10 How a Reference Sequence is Loaded into the Alignment Appliance 131 6.10.1 Loading the Reference Genome from Disk to Memory . . . 131 6.10.2 Transmitting the Reference Genome from Memory to the SDRAM on the DE2-70 Board . . . . . . . . . . . . . . . . 131 6.11 How Reads are Uploaded into the Alignment Appliance . . . . . . 131 6.11.1 Loading the Reads from Disk to Memory . . . . . . . . . . 131 6.11.2 Packaging Seeds into Packets . . . . . . . . . . . . . . . . 132 6.12 Packet Transmission and Reception While in Operation . . . . . . 132 7 Development of an Application to Resolve Ambiguities in Ref- erence Genome Files . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.1 Introduction to Ambiguity Resolution in Reference Genomes . . . 135 7.2 Design and Implementation of an Ambiguity Resolution Application137 7.3 Software Engineering Issues . . . . . . . . . . . . . . . . . . . . . 143 7.4 Testing the Ambiguity Resolution Application . . . . . . . . . . . 145 8 Correctness Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.1 Choice of Test Reference Sequence . . . . . . . . . . . . . . . . . 147 8.2 Choice of Test Seed Dataset . . . . . . . . . . . . . . . . . . . . . 148 8.3 Generation of Correct Hit Results . . . . . . . . . . . . . . . . . . 148 8.4 How the Two Sets of Hit Results are Demonstrated to be Identical 151 9 Performance Measurement . . . . . . . . . . . . . . . . . . . . . 153 9.1 Configuration of FPGA Binary Used for Performance Measurement153 9.2 Choice of Reference Sequence . . . . . . . . . . . . . . . . . . . . 153 9.3 Choice of Read Dataset . . . . . . . . . . . . . . . . . . . . . . . . 154 9.4 Method of Measuring Performance . . . . . . . . . . . . . . . . . 154 9.5 Performance Measurement Values . . . . . . . . . . . . . . . . . . 155 9.6 Extrapolation of Performance Values to a Human-Size Reference Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 10 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . 156 10.1 ComparisonofFPGA-BasedHitFinderandMicroprocessor-Based Hit Finders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 vi 10.1.1 Hardware Cost . . . . . . . . . . . . . . . . . . . . . . . . 156 10.1.2 Inexact Matching . . . . . . . . . . . . . . . . . . . . . . . 157 10.2 Demonstration that the FPGA Hit Finder Produces Identical Re- sults to MAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 10.3 Ideas for Future Work . . . . . . . . . . . . . . . . . . . . . . . . 159 10.4 Future Work: Handling Ambiguous Characters in the Reference Genome in Hardware . . . . . . . . . . . . . . . . . . . . . . . . . 161 10.5 Improvements that Could be Made to the Implementation . . . . 163 10.6 Performance Scaling with Larger Chips . . . . . . . . . . . . . . . 164 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 vii List of Tables 1.1 The sliding window algorithm. . . . . . . . . . . . . . . . . . . . . 24 2n 1.2 Some values of (cid:0) (cid:1), the number of hash tables needed for an n- n mismatch alignment. . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1 The SOPC controller components not shown on the diagram. . . . 73 3.2 The flags which the control flag generator module can generate. . 81 4.1 The reasons that the alignment pipeline can stall. . . . . . . . . . 103 4.2 The effects of a stall event in the alignment pipeline. . . . . . . . 104 5.1 The states the alignment appliance embedded software can be in. 110 5.2 The nine frame types that can be sent from the host workstation to the alignment appliance. . . . . . . . . . . . . . . . . . . . . . 121 viii List of Figures 1.1 Analysis of data from a short read sequencing instrument. . . . . 4 1.2 The Verilog code from listing 1.2.4 shown in the equivalent circuit diagram form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 The three phases of short read alignment. . . . . . . . . . . . . . 21 1.4 An example of 2b/nt encoding. . . . . . . . . . . . . . . . . . . . 23 1.5 Generating query subsequences with a sliding window. . . . . . . 24 1.6 The table used for the direct address method. . . . . . . . . . . . 25 1.7 The exact matching hash table method. . . . . . . . . . . . . . . . 26 1.8 The one-mismatch hash table method. . . . . . . . . . . . . . . . 28 1.9 The two-mismatch hash table method. . . . . . . . . . . . . . . . 31 2.1 Method 1 for implementing custom hardware in an FPGA for em- barrassingly parallel problems. . . . . . . . . . . . . . . . . . . . . 40 2.2 Method 2 for implementing custom hardware in an FPGA for em- barrassingly parallel problems. . . . . . . . . . . . . . . . . . . . . 42 2.3 Properties of the subproblems of alignment. . . . . . . . . . . . . 43 2.4 Image of the Arria GX FPGA development board. . . . . . . . . . 46 2.5 How the Ethernet-based appliance connects to the host workstation. 47 2.6 Image of the Altera DE2-70 board . . . . . . . . . . . . . . . . . . 48 2.7 The hardware setup used during development and testing. . . . . 49 2.8 Adapting the basic multi-parallel-module to a hit finder on the DE2-70 board. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.9 The design of a comparison module. . . . . . . . . . . . . . . . . . 56 2.10 The hardware of a seed register. . . . . . . . . . . . . . . . . . . . 58 3.1 How the SOPC Builder is used. . . . . . . . . . . . . . . . . . . . 66 3.2 Version one of the controller SOPC. . . . . . . . . . . . . . . . . . 67 3.3 The method used to generate flags in version one of the controller. 69 3.4 Version two of the controller SOPC. . . . . . . . . . . . . . . . . . 71 3.5 How a single SDRAM controller drives two SDRAM chips. . . . . 75 3.6 The text FIFO writer custom SOPC component. . . . . . . . . . . 78 3.7 The seed register writer custom SOPC component. . . . . . . . . 79 3.8 The seed register enable bits writer custom SOPC component. . . 80 3.9 The aligner control flags module. . . . . . . . . . . . . . . . . . . 82 3.10 The configuration parameter readout module. . . . . . . . . . . . 84 ix 3.11 The multi-match hit vector readout module. . . . . . . . . . . . . 86 4.1 High level design diagram of the alignment pipeline. . . . . . . . . 89 4.2 Detailed design diagram of the alignment pipeline. . . . . . . . . . 94 4.3 The internals of the seed comparison module. . . . . . . . . . . . 98 4.4 A priority encoder constructed recursively. . . . . . . . . . . . . . 100 4.5 My modified priority encoder constructed recursively with the ex- tra multi match signal. Note that “ multi ” is used as short for “ multi match ” to save space on the diagram. . . . . . . . . . . . 102 4.6 The pipelined version of the alignment pipeline detailed design. . 107 5.1 The allowed statetransitions ofthealignment appliance embedded software controller. . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2 The layout of the reference genome in the SDRAM of the appliance.117 6.1 The transmission and reception of frames between the alignment appliance and host workstation. . . . . . . . . . . . . . . . . . . . 133 7.1 Correctnesstestingthewrongway. Noticethatthetwoalignersare aligning to different reference sequences because their ambiguity resolvers work differently. . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 Correctness testing the right way. Notice that the two aligners are aligning to identical reference sequences. . . . . . . . . . . . . . . 142 7.3 Internal structure of the ambiguity resolution Java application. . . 144 8.1 How the Test Datasets were Generated. . . . . . . . . . . . . . . . 149 8.2 Producing hit finding results for correctness testing. . . . . . . . . 152 10.1 Hardware for Comparing a 4-bit Encoded Nucleotide with a 2-bit Encoded Nucleotide. . . . . . . . . . . . . . . . . . . . . . . . . . 162 x
Description: