RECONFIGURABLE COMPUTING Reconfigurable Computing Accelerating Computation with Field-Programmable Gate Arrays by MAYA GOKHALE Los Alamos National Laboratory, NM, U.S.A. and PAUL S. GRAHAM Los Alamos, NM, U.S.A. A C.I.P. Catalogue record for this book is available from the Library of Congress. ISBN-10 0-387-26105-2 (HB) ISBN-13 978-0-387-26105-8 (HB) ISBN-10 0-387-26106-0 (e-book) ISBN-13 978-0-387-26106-5 (e-book) Published by Springer, P.O. Box 17, 3300 AADordrecht, The Netherlands. www.springeronline.com Printed on acid-free paper All Rights Reserved © 2005 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands. Contents 1 An Introduction to Reconfigurable Computing ............ 1 1.1 What is RC?............................................ 1 1.2 RC Architectures........................................ 3 1.3 How did RC originate?................................... 4 1.4 Inside the FPGA ........................................ 6 1.5 Mapping Algorithms to Hardware ......................... 7 1.6 RC Applications......................................... 8 1.7 Example: Dot Product ................................... 9 1.8 Further Reading......................................... 10 2 Reconfigurable Logic Devices ............................. 11 2.1 Field-Programmable Gate Arrays.......................... 12 2.1.1 Basic Architecture................................. 12 2.1.2 Specialized Function Blocks......................... 22 2.1.3 Programming Architecture ......................... 26 2.2 Coarse-Grained Reconfigurable Arrays ..................... 28 2.2.1 Raw ............................................. 29 2.2.2 PipeRench........................................ 30 2.2.3 RaPiD ........................................... 32 2.2.4 PACT XPP ...................................... 33 2.2.5 MathStar......................................... 35 2.3 Summary............................................... 36 3 Reconfigurable Computing Systems ....................... 37 3.1 Parallel Processing on Reconfigurable Computers............ 37 3.1.1 Instruction Level Parallelism........................ 37 3.1.2 Task Level Parallelism ............................. 39 3.2 A Survey of Reconfigurable Computing Systems............. 41 3.2.1 I/O Bus Accelerator ............................... 43 3.2.2 Massively Parallel FPGA array ..................... 45 3.2.3 Reconfigurable Supercomputer ...................... 45 VI Contents 3.2.4 Reconfigurable Logic Co-processor................... 47 3.3 Summary............................................... 49 4 Languages and Compilation ............................... 51 4.1 Design Cycle............................................ 51 4.2 Languages.............................................. 54 4.2.1 Algorithmic RC Languages ......................... 55 4.2.2 Hardware Description Languages (HDL) ............. 57 4.3 High Level Compilation .................................. 60 4.3.1 Compiler Phases .................................. 65 4.3.2 Analysis and Optimizations......................... 66 4.3.3 Scheduling........................................ 67 4.4 Low Level Design Flow................................... 68 4.4.1 Logic Synthesis ................................... 69 4.4.2 Technology Mapping............................... 70 4.4.3 Logic Placement .................................. 71 4.4.4 Signal Routing.................................... 72 4.4.5 Configuration Bitstreams........................... 73 4.5 Debugging Reconfigurable Computing Applications .......... 74 4.5.1 Basic Needs for Debugging ......................... 74 4.5.2 Debugging Facilities ............................... 75 4.5.3 Challenges for RC Application Debugging ............ 84 4.6 Summary............................................... 85 5 Signal Processing Applications............................. 87 5.1 What is Digital Signal Processing?......................... 87 5.2 Why Use Reconfigurable Computing for DSP? .............. 89 5.2.1 Reconfigurable Computing’s Suitability for DSP....... 89 5.2.2 Comparing DSP Implementation Technologies ........ 92 5.3 DSP Application Building Blocks.......................... 96 5.3.1 Basic Operations and Elements ..................... 97 5.3.2 Filtering .........................................102 5.3.3 Transforms .......................................103 5.4 Example DSP Applications ...............................108 5.4.1 Beamforming .....................................108 5.4.2 Software Radio....................................112 5.5 Summary...............................................117 6 Image Processing ........................................119 6.1 RC for Image and Video Processing........................119 6.2 Local Neighborhood Functions ............................121 6.2.1 Cellular Arrays for Pixel Parallelism .................123 6.2.2 Image Pipelines for Instruction-Level Parallelism ......123 6.3 Convolution ............................................124 6.4 Morphology ............................................125 Contents VII 6.5 Feature Extraction ......................................127 6.6 Automatic Target Recognition ............................129 6.7 Image Matching ........................................131 6.8 Evolutionary Image Processing ...........................134 6.9 Summary...............................................139 7 Network Security ..........................................141 7.1 Cryptographic Applications...............................141 7.1.1 Cryptography Basics...............................142 7.1.2 RC Cryptographic Algorithm Implementations ........146 7.2 Network Protocol Security ...............................148 7.2.1 RC Network Interface..............................148 7.2.2 Security Protocols .................................151 7.2.3 Network Defense ..................................152 7.3 Summary...............................................155 8 Bioinformatics Applications................................157 8.1 Introduction ............................................157 8.2 Applications ............................................159 8.2.1 Genome Assembly.................................159 8.2.2 Content-Based Search..............................160 8.2.3 Genome Comparison...............................160 8.2.4 Molecular Phylogeny...............................161 8.2.5 Pattern Matching .................................161 8.2.6 Protein Domain Databases .........................162 8.3 Dynamic Programming Algorithms ........................163 8.3.1 Alignments .......................................163 8.3.2 Dynamic Programming Equations ...................164 8.3.3 Gap Functions ....................................166 8.3.4 Systolic DP Computation ..........................166 8.3.5 Backtracking .....................................167 8.3.6 Modulo Encoding .................................169 8.3.7 FPGA Implementations ............................170 8.4 Seed-Based Heuristics....................................170 8.4.1 Filtering, Heuristics, and Quality Values .............171 8.4.2 BLAST : a 3-Stages Heuristic.......................171 8.4.3 Seed Indexing.....................................172 8.4.4 FPGA Implementations ............................174 8.5 Profiles, HMMs and Language Models......................174 8.5.1 Position-Dependent Profiles.........................174 8.5.2 Hidden Markov Models ............................175 8.5.3 Language Models..................................176 8.6 Bioinformatics FPGA Accelerators ........................177 8.6.1 Splash ...........................................178 8.6.2 Perle ............................................178 VIII Contents 8.6.3 GenStorm ........................................178 8.6.4 RDisk ...........................................178 8.6.5 BioXL/H.........................................181 8.6.6 DeCypher ........................................181 8.7 Summary...............................................181 9 Supercomputing Applications..............................183 9.1 Introduction ............................................183 9.2 Monte Carlo Simulation of Radiative Heat Transfer .........184 9.2.1 Algorithm Description .............................185 9.2.2 Hardware Implementation ..........................187 9.2.3 Performance ......................................188 9.3 Urban Road Traffic Simulation ............................192 9.3.1 CA Traffic Modeling ...............................193 9.3.2 Intersections and Global Behavior ...................194 9.3.3 Constructive Approach.............................196 9.3.4 Streaming Approach ...............................198 9.4 Summary...............................................202 References.....................................................205 Index..........................................................233 Acknowledgments WewouldliketorecognizetheInternational,Space,andResponse(ISR)Tech- nologies Division and the Laboratory-Directed Research and Development (LDRD) Program at Los Alamos National Laboratory for their invaluable support during the writing and editing of this book. Wewouldliketoacknowledgethecontributionsoftwoinvitedchapterau- thors.ReidB.PorterfromLosAlamosNationalLaboratorywroteChapter6, ImageProcessing,providinganexcellentdiscussionofhowreconfigurablecom- puting has been employed in the broad field of image processing. Dominique Lavenier and Mathieu Giraud from IRISA, Rennes France wrote Chapter 8, Bioinformatics Applications, drawing on their extensive background in bioin- formatics to describeseveralapplications fromthe field and the role of recon- figurable computing in these applications. The material in Chapter 9, Supercomputing Applications, is derived from two papers written by researchers at Los Alamos National Laboratory. The firstpaper,“AcceleratingMonteCarloRadiativeHeatTransferSimulationon a Reconfigurable Computer: An Evaluation”, was written by Maya Gokhale, Janette Frigo, Christine Ahrens, Justin L. Tripp and Ronald G. Minnich and was published in the Proceedings of the 2004 International Conference on Field-Programmable Logic and Applications. The second paper,“Acceleration of Traffic Simulation on Reconfigurable Hardware”, was written by Justin L. Tripp,HenningS.Mortveit,MatthewS.Nassr,AndersA.Hansson,andMaya Gokhale and was presented at the 2004 International Conference on Military and Aerospace Programmable Logic Devices. Thanks are due to Janette Frigo for the phase modulation sorter example andtoKrisGajandPeterBellowsforveryhelpfuldiscussionsoncryptography and network security. We would also like to acknowledge the support from our families during the pursuit of this endeavor. The importance of their support during this project cannot be overstated. Of special note, Ron Minnich gave up many hours reading through the manuscript and providing us valuable feedback X Acknowledgments whileSamaraGrahamtookontheextradutyoftryingtokeepthreechildren in line while Paul worked many late evenings. This book is dedicated to our families. 1 An Introduction to Reconfigurable Computing Reconfigurable Computing (RC), the use of programmable logic to accelerate computation, arose in the late ’80’s with the widespread commercial avail- ability of Field-Programmable Gate Arrays (FPGAs). The innovative devel- opmentofFPGAswhoseconfigurationcouldbere-programmedanunlimited number of times spurred the invention of a new field in which many different hardware algorithms could execute, in turn, on a single device, just as many different software algorithms can run on a conventional processor. The speed advantage of direct hardware execution on the FPGA – rou- tinely10Xto100Xtheequivalentsoftwarealgorithm–attractedtheattention of the supercomputing community as well as Digital Signal Processing (DSP) systemsdevelopers.RCresearchersfoundthatFPGAsoffersignificantadvan- tagesovermicroprocessorsandDSPsforhighperformance,lowvolumeappli- cations, particularly for applications that can exploit customized bit widths andmassiveinstruction-levelparallelism.Anevenmorecompellingargument for using FPGAs as reconfigurable computers has been the commercial avail- ability of devices that continue to track Moore’s Law. FPGAs contain large amounts of SRAM combined with regularly tiled and interconnected logic blocks.ThesedevicesfollowtheInternationalTechnologyRoadmapforSemi- conductors (ITRS) roadmap [3] for memory rather than microprocessors and are often first on the leading new fabrication lines. Thus reconfigurable com- puters advance technologically at a faster rate than microprocessors. 1.1 What is RC? The speed advantage of FPGAs derives from the fact that the programmable hardware is customized to a particular algorithm. Thus the FPGA can be configured to contain exactly and only those operations that appear in the algorithm. In contrast, the design of a fixed instruction set processor must accommodate all possible operations that an algorithm might require for all possible data types. An FPGA can be configured to perform arbitrary fixed
Description: