ebook img

Short-Read DNA Sequence Alignment with Custom Designed FPGA-based Hardware PDF

186 Pages·2010·2.96 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Short-Read DNA Sequence Alignment with Custom Designed FPGA-based Hardware

Short-Read DNA Sequence Alignment with Custom Designed FPGA-based Hardware by Adam Hall B.A., The University of Cambridge, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Bioinformatics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) November 2010 c Adam Hall, 2010 (cid:13) Abstract The alignment of short DNA read sequencing data to a human reference genome sequence has become a standard step in the analysis pipeline for short DNA read sequence data. As the rate at which short read DNA sequence data is being produced doubles every 5 months, analysis of this data in a computationally efficient way is becoming increasingly important. We demonstrate how we can exploit the “embarrassingly parallel” property of short read sequence alignment in custom-designed hardware in FPGAs. Hardware is chosen, a system is designed, and this system is implemented. My FPGA-based hit finder was demonstrated to produce correct hit results. The performance of this single FPGA implementation was demonstrated to be 71,000 seed hits found per hour on a human genome sized reference sequence. The implementation was demonstrated to produce identical results to the hit finder stage of the MAQ aligner. We demonstrate that the price/performance of this sliding-window FPGA aligner ( 355 seeds/hr/$) compares favorably to the price/performance of ∼ sliding-window software aligners ( 67.5 seeds/hr/$ for MAQ). However, ∼ software aligners which are based on the superior Burrows-Wheeler alignment algorithm still have a significant price/performance advantage over the FPGA-based approach ( 7,200 seeds/hr/$). We predict that as chips continue ∼ to increase in size due to Moores Law and computation is performed in high-density cloud-computing datacenters the FPGA-based approach will become preferable to current software aligners. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Illumina Short-Read DNA Sequencing . . . . . . . . . . . 2 1.2.2 The Short-Read Alignment Problem . . . . . . . . . . . . 3 1.2.3 Field Programmable Gate Arrays (FPGA’s) . . . . . . . . 6 1.2.4 Programming FPGA’s . . . . . . . . . . . . . . . . . . . . 7 1.2.5 Instantiating a Soft-Core Processor in an FPGA . . . . . . 10 1.2.6 Adoption of the “Cloud Computing” Model in Bioinfor- matics(Stein)(Baker) . . . . . . . . . . . . . . . . . . . . . 11 1.2.7 How BLAST (Basic Local Alignment Search Tool) and Other Related Algorithms Work . . . . . . . . . . . . . . . 17 1.3 Software Aligners . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.1 The Indexing/Hit Finding/Hit Extension Paradigm . . . . 19 1.3.2 Error Models . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.3 ELAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.4 MAQ (Mapping and Assembly with Qualities)(Li, Ruan, and Durbin) . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.3.5 SOAP(Li) . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.3.6 PASS(Campagna, Albiero, Bilardi, Caniato, Forcato, Man- avski, Vitulo, and Valle) . . . . . . . . . . . . . . . . . . . 35 1.3.7 SeqMap(Jiang and Wong) . . . . . . . . . . . . . . . . . . 36 1.3.8 Slider(Malhis, Butterfield, Ester, and Jones) . . . . . . . . 36 1.3.9 Bowtie(Li and Durbin) . . . . . . . . . . . . . . . . . . . . 36 iii 1.4 Other Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.4.1 Dynamic Programming in FPGA’s . . . . . . . . . . . . . 37 1.4.2 Other Previous Uses of FPGA’s in Bioinformatics . . . . . 37 1.4.3 A Previous Implementation of a Short Read Aligner in FPGA Hardware(McMahon) . . . . . . . . . . . . . . . . . 38 2 Overall System Architecure . . . . . . . . . . . . . . . . . . . . . 39 2.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 First Implementation Attempt: Using the Cray XD1 FPGA- Accelerated Computer . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3 Second Implementation Attempt: Development of a PCI-Express based Accelerator Card for the Host Workstation . . . . . . . . . 46 2.4 Third, Final Implementation Attempt: Development of an Ethernet-based Appliance . . . . . . . . . . . . . . . . . . . . . . 47 2.5 Choice of Development Tools . . . . . . . . . . . . . . . . . . . . 48 2.6 Development Hardware Setup . . . . . . . . . . . . . . . . . . . . 49 2.7 Design Decision: Where to Store the Reference Sequence . . . . . 50 2.8 Design Decision: Which Devices are Used on the DE2-70 Board and What Happens to the Rest . . . . . . . . . . . . . . . . . . . 51 2.9 Adapting the The Basic Idea to Short Read Alignment with the DE2-70 Board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.10 Design Decision: Method of Getting Reference Sequence Data into the Query Generator Sliding Window . . . . . . . . . . . . . . . . 59 2.10.1 Method 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.10.2 Method 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.10.3 Method 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.11 System Components . . . . . . . . . . . . . . . . . . . . . . . . . 61 3 SOPC Controller Development . . . . . . . . . . . . . . . . . . . 63 3.1 What the SOPC Controller Does . . . . . . . . . . . . . . . . . . 63 3.2 Implementation Method . . . . . . . . . . . . . . . . . . . . . . . 64 3.3 The SOPC Builder . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4 Design Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5 Design Version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Combining the two 32MB SDRAM Chips into a Single 64MB Memory Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.7 Clock Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.7.1 Method Used to Generate the Clocks . . . . . . . . . . . . 76 3.8 Text FIFO Writer Custom SOPC component . . . . . . . . . . . . 78 3.9 The Seed Register Writer Custom SOPC Component . . . . . . . 79 3.10 The Seed Register Enable Bit Writer Custom SOPC Component . 80 3.11 The Control Flag Generator Module . . . . . . . . . . . . . . . . 80 3.12 The Parameter Readout Module . . . . . . . . . . . . . . . . . . . 83 iv 3.13 The Multi-Match Hit Vector Readout Module . . . . . . . . . . . 85 3.14 Getting Data From the Results FIFO into the SOPC . . . . . . . 85 4 Alignment Pipeline Development . . . . . . . . . . . . . . . . . . 88 4.1 Overall Pipeline Design . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 Design Decision: Allowing Multiple Identical Seeds in a Batch . . 91 4.3 Detailed Pipeline Design . . . . . . . . . . . . . . . . . . . . . . . 93 4.4 The Text FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5 The Query Generator . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.6 The Seed Comparison Module . . . . . . . . . . . . . . . . . . . . 97 4.7 The Priority Encoder . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.8 The Results FIFO . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.9 Computing the Stall Signal . . . . . . . . . . . . . . . . . . . . . . 103 4.10 Alignment Pipeline Global Reset . . . . . . . . . . . . . . . . . . 104 4.11 Increasing Clock Frequency with Pipelining . . . . . . . . . . . . . 105 5 Embedded Software Development . . . . . . . . . . . . . . . . . 108 5.1 What the Embedded Software Does . . . . . . . . . . . . . . . . . 108 5.2 Overall Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.3 States and State Transitions . . . . . . . . . . . . . . . . . . . . . 110 5.4 The Main Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.2 Loop Contents . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4.3 Shut Down . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.5 The Bit Manipulation Functions . . . . . . . . . . . . . . . . . . . 113 5.5.1 Other Support Functions Written . . . . . . . . . . . . . . 114 5.6 Retrieving Configuration Parameters from the Alignment Appli- ance Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.7 Initialization of Seed Registers and Comparison Module Enable Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.8 Design Decision: How the Reference Genome is Contained in the SDRAM Text Buffer . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.9 Other functions performed by the Embedded Software . . . . . . 116 5.10 Developement of Driver Software for the DM9000A Ethernet In- terface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.10.1 Ethernet Interface Configuration . . . . . . . . . . . . . . 118 5.10.2 Method to Read from the DM9000A Ethernet Frame Re- ceive Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.10.3 Method to Send an Ethernet Frame with the DM9000A Ethernet Interface Controller . . . . . . . . . . . . . . . . 120 6 Development of Appliance Control Application for the Work- station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 v 6.1 What the Workstation Software Does . . . . . . . . . . . . . . . . 123 6.2 Choice of Library for Sending and Receiving Ethernet Frames . . 124 6.3 Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.4 Type of Packets Used . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.5 Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.6 Receiving Ethernet Frames . . . . . . . . . . . . . . . . . . . . . . 125 6.7 Parsing the Command Line Arguments . . . . . . . . . . . . . . . 126 6.8 Writing Unsigned Integer Classes for Java . . . . . . . . . . . . . 127 6.9 Retrieving Configuration Parameters from the Alignment Appliance130 6.10 How a Reference Sequence is Loaded into the Alignment Appliance 131 6.10.1 Loading the Reference Genome from Disk to Memory . . . 131 6.10.2 Transmitting the Reference Genome from Memory to the SDRAM on the DE2-70 Board . . . . . . . . . . . . . . . . 131 6.11 How Reads are Uploaded into the Alignment Appliance . . . . . . 131 6.11.1 Loading the Reads from Disk to Memory . . . . . . . . . . 131 6.11.2 Packaging Seeds into Packets . . . . . . . . . . . . . . . . 132 6.12 Packet Transmission and Reception While in Operation . . . . . . 132 7 Development of an Application to Resolve Ambiguities in Ref- erence Genome Files . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.1 Introduction to Ambiguity Resolution in Reference Genomes . . . 135 7.2 Design and Implementation of an Ambiguity Resolution Application137 7.3 Software Engineering Issues . . . . . . . . . . . . . . . . . . . . . 143 7.4 Testing the Ambiguity Resolution Application . . . . . . . . . . . 145 8 Correctness Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.1 Choice of Test Reference Sequence . . . . . . . . . . . . . . . . . 147 8.2 Choice of Test Seed Dataset . . . . . . . . . . . . . . . . . . . . . 148 8.3 Generation of Correct Hit Results . . . . . . . . . . . . . . . . . . 148 8.4 How the Two Sets of Hit Results are Demonstrated to be Identical 151 9 Performance Measurement . . . . . . . . . . . . . . . . . . . . . 153 9.1 Configuration of FPGA Binary Used for Performance Measurement153 9.2 Choice of Reference Sequence . . . . . . . . . . . . . . . . . . . . 153 9.3 Choice of Read Dataset . . . . . . . . . . . . . . . . . . . . . . . . 154 9.4 Method of Measuring Performance . . . . . . . . . . . . . . . . . 154 9.5 Performance Measurement Values . . . . . . . . . . . . . . . . . . 155 9.6 Extrapolation of Performance Values to a Human-Size Reference Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 10 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . 156 10.1 ComparisonofFPGA-BasedHitFinderandMicroprocessor-Based Hit Finders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 vi 10.1.1 Hardware Cost . . . . . . . . . . . . . . . . . . . . . . . . 156 10.1.2 Inexact Matching . . . . . . . . . . . . . . . . . . . . . . . 157 10.2 Demonstration that the FPGA Hit Finder Produces Identical Re- sults to MAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 10.3 Ideas for Future Work . . . . . . . . . . . . . . . . . . . . . . . . 159 10.4 Future Work: Handling Ambiguous Characters in the Reference Genome in Hardware . . . . . . . . . . . . . . . . . . . . . . . . . 161 10.5 Improvements that Could be Made to the Implementation . . . . 163 10.6 Performance Scaling with Larger Chips . . . . . . . . . . . . . . . 164 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 vii List of Tables 1.1 The sliding window algorithm. . . . . . . . . . . . . . . . . . . . . 24 2n 1.2 Some values of (cid:0) (cid:1), the number of hash tables needed for an n- n mismatch alignment. . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1 The SOPC controller components not shown on the diagram. . . . 73 3.2 The flags which the control flag generator module can generate. . 81 4.1 The reasons that the alignment pipeline can stall. . . . . . . . . . 103 4.2 The effects of a stall event in the alignment pipeline. . . . . . . . 104 5.1 The states the alignment appliance embedded software can be in. 110 5.2 The nine frame types that can be sent from the host workstation to the alignment appliance. . . . . . . . . . . . . . . . . . . . . . 121 viii List of Figures 1.1 Analysis of data from a short read sequencing instrument. . . . . 4 1.2 The Verilog code from listing 1.2.4 shown in the equivalent circuit diagram form. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 The three phases of short read alignment. . . . . . . . . . . . . . 21 1.4 An example of 2b/nt encoding. . . . . . . . . . . . . . . . . . . . 23 1.5 Generating query subsequences with a sliding window. . . . . . . 24 1.6 The table used for the direct address method. . . . . . . . . . . . 25 1.7 The exact matching hash table method. . . . . . . . . . . . . . . . 26 1.8 The one-mismatch hash table method. . . . . . . . . . . . . . . . 28 1.9 The two-mismatch hash table method. . . . . . . . . . . . . . . . 31 2.1 Method 1 for implementing custom hardware in an FPGA for em- barrassingly parallel problems. . . . . . . . . . . . . . . . . . . . . 40 2.2 Method 2 for implementing custom hardware in an FPGA for em- barrassingly parallel problems. . . . . . . . . . . . . . . . . . . . . 42 2.3 Properties of the subproblems of alignment. . . . . . . . . . . . . 43 2.4 Image of the Arria GX FPGA development board. . . . . . . . . . 46 2.5 How the Ethernet-based appliance connects to the host workstation. 47 2.6 Image of the Altera DE2-70 board . . . . . . . . . . . . . . . . . . 48 2.7 The hardware setup used during development and testing. . . . . 49 2.8 Adapting the basic multi-parallel-module to a hit finder on the DE2-70 board. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.9 The design of a comparison module. . . . . . . . . . . . . . . . . . 56 2.10 The hardware of a seed register. . . . . . . . . . . . . . . . . . . . 58 3.1 How the SOPC Builder is used. . . . . . . . . . . . . . . . . . . . 66 3.2 Version one of the controller SOPC. . . . . . . . . . . . . . . . . . 67 3.3 The method used to generate flags in version one of the controller. 69 3.4 Version two of the controller SOPC. . . . . . . . . . . . . . . . . . 71 3.5 How a single SDRAM controller drives two SDRAM chips. . . . . 75 3.6 The text FIFO writer custom SOPC component. . . . . . . . . . . 78 3.7 The seed register writer custom SOPC component. . . . . . . . . 79 3.8 The seed register enable bits writer custom SOPC component. . . 80 3.9 The aligner control flags module. . . . . . . . . . . . . . . . . . . 82 3.10 The configuration parameter readout module. . . . . . . . . . . . 84 ix 3.11 The multi-match hit vector readout module. . . . . . . . . . . . . 86 4.1 High level design diagram of the alignment pipeline. . . . . . . . . 89 4.2 Detailed design diagram of the alignment pipeline. . . . . . . . . . 94 4.3 The internals of the seed comparison module. . . . . . . . . . . . 98 4.4 A priority encoder constructed recursively. . . . . . . . . . . . . . 100 4.5 My modified priority encoder constructed recursively with the ex- tra multi match signal. Note that “ multi ” is used as short for “ multi match ” to save space on the diagram. . . . . . . . . . . . 102 4.6 The pipelined version of the alignment pipeline detailed design. . 107 5.1 The allowed statetransitions ofthealignment appliance embedded software controller. . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2 The layout of the reference genome in the SDRAM of the appliance.117 6.1 The transmission and reception of frames between the alignment appliance and host workstation. . . . . . . . . . . . . . . . . . . . 133 7.1 Correctnesstestingthewrongway. Noticethatthetwoalignersare aligning to different reference sequences because their ambiguity resolvers work differently. . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 Correctness testing the right way. Notice that the two aligners are aligning to identical reference sequences. . . . . . . . . . . . . . . 142 7.3 Internal structure of the ambiguity resolution Java application. . . 144 8.1 How the Test Datasets were Generated. . . . . . . . . . . . . . . . 149 8.2 Producing hit finding results for correctness testing. . . . . . . . . 152 10.1 Hardware for Comparing a 4-bit Encoded Nucleotide with a 2-bit Encoded Nucleotide. . . . . . . . . . . . . . . . . . . . . . . . . . 162 x

Description:
2.7 Design Decision: Where to Store the Reference Sequence . 50. 2.8 Design .. as an embedded controller for a larger system, or blocks on an FPGA can be used to create high-performance amino acid sequence.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.