Table Of ContentField-Programmable Gate Array
Architectures and Algorithms Optimized
for Implementing Datapath Circuits
Andy Gean Ye
November 2004
Field-Programmable Gate Array
Architectures and Algorithms Optimized
for Implementing Datapath Circuits
by
Andy Gean Ye
A thesis submitted in conformity with
the requirements for the degree of
Doctor of Philosophy
November 2004
The Edward S. Rogers Sr. Department of
Electrical and Computer Engineering
University of Toronto
Toronto, Ontario, Canada
© Copyright by Andy Gean Ye 2004
“We are pattern-seeking animals, the descendents of hominids who were especially dex-
terous at making causal links between events in nature. The associations were real often
enough that the ability became engrained in our neural architecture. Unfortunately, the belief
engine sputters occasionally, identifying false patterns as real . . .
The solution is science, our preeminent pattern-discriminating method and our best hope
for detecting a genuine signal within the noise of nature’s cacophony.”
– “Codified Claptrap,” Michael Shermer, Scientific American, June 2003
ii
iii
Abstract
Field-Programmable Gate Arrays (FPGAs) are user-programmable digital devices that
provide efficient, yet flexible, implementations of digital circuits. Over the years, the logic
capacity of FPGAs has been dramatically increased; and currently they are being used to
implement large arithmetic-intensive applications, which contain a greater portion of datapath
circuits. Each circuit, constructed out of multiple identical building blocks called bit-slices,
has highly regular structures. These regular structures have been routinely exploited to
increase speed and area-efficiency in designing custom Application Specific Integrated Cir-
cuits (ASIC).
Previous research suggests that the implementation area of datapath circuits on FPGAs
can also be significantly reduced by exploiting datapath regularity through an architectural
feature called configuration memory sharing (CMS), which takes advantage of datapath regu-
larity by sharing configuration memory bits across, normally independently controlled, recon-
figurable FPGA resources. The results of these studies suggest that CMS can reduce the total
area required to implement a datapath circuit on FPGA by as much as 50%. They, however,
did not take into account detailed implementation issues such as transistor sizing, utilizable
regularity in actual datapath circuits, and Computer-Aided Design (CAD) tool efficiencies.
This study is the first major in-depth study on CMS. The study found that when detailed
implementation issues are taken into account, the actual achievable area savings can be signif-
icant less than the previous estimations — the CMS architecture investigated in this study is
only about 10% more area efficient than a comparable conventional and widely studied FPGA
architecture for implementing datapath circuits. Furthermore, this increase in area efficiency
has a potential speed penalty of around 10%.
iv
To conduct the study, a new area-efficient FPGA architecture is designed along with its
supporting CAD tools. The architecture, called Multi-Bit FPGA (MB-FPGA), is the first com-
pletely specified FPGA architecture that employs CMS routing resources. This sharing signif-
icantly reduces the number of configuration memory bits and consequently increases its area
efficiency.
The use of the CMS resources, however, imposes new demands on the traditional FPGA
CAD algorithms. As a result, a complete set of CAD tools supporting FPGAs containing CMS
resources are proposed and implemented. These tools are designed to extract and utilize datap-
ath regularity for the CMS resources. It is shown that these tools yield excellent results for
implementing a set of realistic datapath circuits on the MB-FPGA architecture.
v
Acknowledgements
I would like to take this opportunity to express my sincere thanks and appreciation to my
academic supervisors. Professor Jonathan S. Rose and Professor David M. Lewis have pro-
vided continual source of guidance, support, advice, and friendship through out my graduate
studies. It has been my privilege to work with these two experienced academics and excellent
engineers. They have made my doctoral studies a truly rewarding and unforgettable experi-
ence. I would especially like to thank Professor Jonathan S. Rose for taking the extra mile to
point out the big pictures in my research and my academic career. I would also like to thank
Professor David M. Lewis for all his extremely detailed and insightful technical advice.
My father and mother have always been a constant support throughout my studies and
my personal life. Their courage, kindness, hard-working ethics, and constant striving for good-
ness, have been a great inspiration to me. I am especially inspired by their courage in over-
coming almost insurmountable difficulties in immigrating and establishing themselves in
Canada. This thesis is as much an achievement of theirs as it is mine.
I would like to thank my academic supervisors, the Natural Sciences and Engineering
Research Council, the Ontario Government, Communications and Information Technology
Ontario, and Micronet for their financial support.
Finally, I would like to thank all my friends for endless hours of play, insightful discus-
sions, rejuvenating lunch outings, friendship, support, and encouragement. Thank you all!
vi
vii
TABLE OF CONTENTS
1 Introduction
1.1 Introduction to Field-Programmable Gate Arrays ......................................................1
1.2 Thesis Motivation .......................................................................................................2
1.3 Research Approach .....................................................................................................3
1.4 Thesis Contribution .....................................................................................................4
1.4 Thesis Organization ....................................................................................................4
2 Background
2.1 Introduction .................................................................................................................7
2.2 FPGA CAD Flow ........................................................................................................7
2.2.1 Synthesis and Technology Mapping ....................................................................9
2.2.1.1 Synopsys FPGA Compiler ............................................................................10
2.2.1.2 Datapath-Oriented Synthesis .........................................................................11
2.2.2 Packing .................................................................................................................12
2.2.3 Placement and Routing ........................................................................................13
2.2.3.1 VPR Placer and Router .................................................................................13
The VPR Placer .....................................................................................................14
The VPR Router ....................................................................................................14
2.3 FPGA Architectures ....................................................................................................15
2.3.1 A Conventional FPGA Architecture ....................................................................16
2.3.1.1 Logic Clusters ...............................................................................................16
Local Routing Network .........................................................................................18
2.3.1.2 Routing Switches ..........................................................................................19
2.3.1.3 Routing Channels ..........................................................................................20
2.3.1.4 Switch Blocks ...............................................................................................22
2.3.1.5 Input and Output Connection Blocks ............................................................24
2.3.1.6 I/O Blocks .....................................................................................................25
2.3.2 DP-FPGA — A Datapath-Oriented FPGA Architecture .....................................26
2.3.2.1 Overview of the Datapath Block ...................................................................26
2.3.2.2 Arithmetic Look-Up Tables ..........................................................................29
2.3.2.3 Logic Blocks .................................................................................................30
Data Connection Blocks ........................................................................................32
Shift Blocks ...........................................................................................................33
2.3.4 Other Datapath-Oriented Field-Programmable Architectures .............................34
2.3.4.1 Processor-Based Architectures .....................................................................35
2.3.4.2 Static ALU-Based Architectures ...................................................................37
2.3.4.3 Dynamic ALU-Based Architectures .............................................................38
2.3.4.4 LUT-Based Architectures .............................................................................39
2.3.4.5 Datapath-Oriented Features on Commercial FPGAs ....................................40
2.3.5 Delay and Area Modeling ....................................................................................41
2.4 Summary .....................................................................................................................42
viii
3 A Datapath-Oriented FPGA Architecture
3.1 Introduction .................................................................................................................43
3.2 Motivation ...................................................................................................................45
3.2.1 Heterogeneous Architecture .................................................................................45
3.2.2 Logic Block Efficiency ........................................................................................46
3.2.3 Parameterization ..................................................................................................48
3.3 Design Goals of MB-FPGA ........................................................................................48
3.4 A Model for Arithmetic-Intensive Applications .........................................................49
3.5 General Approach and Overall Architectural Description ..........................................52
3.5.1 Partitioning Datapath Circuits into Super-Clusters ..............................................52
3.5.2 Implementing Non-Datapath Circuits on MB-FPGA ..........................................54
3.6 The MB-FPGA Architecture .......................................................................................54
3.6.1 Super-Clusters ......................................................................................................55
3.6.1.1 Clusters ..........................................................................................................57
Local Routing Network .........................................................................................58
Carry Network in Detail ........................................................................................59
3.6.1.2 Configuration Memory Sharing ....................................................................60
3.6.2 Routing Switches .................................................................................................60
3.6.3 Routing Channels .................................................................................................61
3.6.4 Switch Blocks ......................................................................................................63
3.6.5 Input and Output Connection Blocks ...................................................................65
3.6.6 I/O Blocks ............................................................................................................67
3.7 Summary .....................................................................................................................68
4 An Area Efficient Synthesis Algorithm for Datapath Circuits
4.1 Introduction .................................................................................................................69
4.2 Motivation and Background .......................................................................................70
4.3 Datapath Circuit Representation .................................................................................73
4.4 The EMC Synthesis Algorithm ...................................................................................75
4.4.1 Word-Level Optimization ....................................................................................76
4.4.1.1 Common Sub-expression Extraction ............................................................76
4.4.1.2 Operation Reordering ....................................................................................79
4.4.2 Module Compaction .............................................................................................81
4.4.3 Bit-Slice Netlist I/O Optimization .......................................................................83
4.5 Experimental Results ..................................................................................................86
4.5.1 Area Inflation .......................................................................................................86
4.5.2 Regularity .............................................................................................................89
4.5.2.1 Logic Regularity ...........................................................................................89
4.5.2.2 Net Regularity ...............................................................................................90
4.6 Conclusion ..................................................................................................................93
5 A Datapath-Oriented Packing Algorithm
5.1 Introduction .................................................................................................................95
5.2 Motivation ...................................................................................................................96
5.3 General Approach and Problem Definition ................................................................99
5.4 Datapath Circuit Representation .................................................................................100
ix
Description:Field-Programmable Gate Arrays (FPGAs) are user-programmable digital
devices that provide efficient, yet flexible, implementations of digital circuits. Over
the