Table Of ContentPRINCIPLES OF
ASYNCHRONOUS CIRCUIT DESIGN
– A Systems Perspective
Editedby
JENSSPARSØ
TechnicalUniversityofDenmark
STEVEFURBER
TheUniversityofManchester,UK
KluwerAcademicPublishers
Boston/Dordrecht/London
Contents
Preface xi
PartI Asynchronouscircuitdesign–Atutorial
Author: JensSparsø
1
Introduction 3
1.1 Whyconsiderasynchronouscircuits? 3
1.2 Aimsandbackground 4
1.3 Clockingversushandshaking 5
1.4 OutlineofPartI 8
2
Fundamentals 9
2.1 Handshakeprotocols 9
2.1.1 Bundled-dataprotocols 9
2.1.2 The4-phasedual-railprotocol 11
2.1.3 The2-phasedual-railprotocol 13
2.1.4 Otherprotocols 13
2.2 TheMullerC-elementandtheindicationprinciple 14
2.3 TheMullerpipeline 16
2.4 Circuitimplementationstyles 17
2.4.1 4-phasebundled-data 18
2.4.2 2-phasebundleddata(Micropipelines) 19
2.4.3 4-phasedual-rail 20
2.5 Theory 23
2.5.1 Thebasicsofspeed-independence 23
2.5.2 Classificationofasynchronouscircuits 25
2.5.3 Isochronicforks 26
2.5.4 Relationtocircuits 26
2.6 Test 27
2.7 Summary 28
3
Staticdata-flowstructures 29
3.1 Introduction 29
3.2 Pipelinesandrings 30
v
vi PRINCIPLESOFASYNCHRONOUSCIRCUITDESIGN
3.3 Buildingblocks 31
3.4 Asimpleexample 33
3.5 Simpleapplicationsofrings 35
3.5.1 Sequentialcircuits 35
3.5.2 Iterativecomputations 35
3.6 FOR,IF,andWHILEconstructs 36
3.7 Amorecomplexexample: GCD 38
3.8 Pointerstoadditionalexamples 39
3.8.1 Alow-powerfilterbank 39
3.8.2 Anasynchronousmicroprocessor 39
3.8.3 Afine-grainpipelinedvectormultiplier 40
3.9 Summary 40
4
Performance 41
4.1 Introduction 41
4.2 Aqualitativeviewofperformance 42
4.2.1 Example1: AFIFOusedasashiftregister 42
4.2.2 Example2: Ashiftregisterwithparallelload 44
4.3 Quantifyingperformance 47
4.3.1 Latency,throughputandwavelength 47
4.3.2 Cycletimeofaring 49
4.3.3 Example3: Performanceofa3-stagering 51
4.3.4 Finalremarks 52
4.4 Dependencygraphanalysis 52
4.4.1 Example4: Dependencygraphforapipeline 52
4.4.2 Example5: Dependencygraphfora3-stagering 54
4.5 Summary 56
5
Handshakecircuitimplementations 57
5.1 Thelatch 57
5.2 Fork,join,andmerge 58
5.3 Functionblocks–Thebasics 60
5.3.1 Introduction 60
5.3.2 Transparencytohandshaking 61
5.3.3 Reviewofripple-carryaddition 64
5.4 Bundled-datafunctionblocks 65
5.4.1 Usingmatcheddelays 65
5.4.2 Delayselection 66
5.5 Dual-railfunctionblocks 67
5.5.1 Delayinsensitivemintermsynthesis(DIMS) 67
5.5.2 NullConventionLogic 69
5.5.3 Transistor-levelCMOSimplementations 70
5.5.4 Martin’sadder 71
5.6 Hybridfunctionblocks 73
5.7 MUXandDEMUX 75
5.8 Mutualexclusion,arbitrationandmetastability 77
5.8.1 Mutualexclusion 77
5.8.2 Arbitration 79
5.8.3 Probabilityofmetastability 79
Contents vii
5.9 Summary 80
6
Speed-independentcontrolcircuits 81
6.1 Introduction 81
6.1.1 Asynchronoussequentialcircuits 81
6.1.2 Hazards 82
6.1.3 Delaymodels 83
6.1.4 Fundamentalmodeandinput-outputmode 83
6.1.5 Synthesisoffundamentalmodecircuits 84
6.2 Signaltransitiongraphs 86
6.2.1 PetrinetsandSTGs 86
6.2.2 SomefrequentlyusedSTGfragments 88
6.3 Thebasicsynthesisprocedure 91
6.3.1 Example1: aC-element 92
6.3.2 Example2: acircuitwithchoice 92
6.3.3 Example2: Hazardsinthesimplegateimplementation 94
6.4 Implementationsusingstate-holdinggates 96
6.4.1 Introduction 96
6.4.2 Excitationregionsandquiescentregions 97
6.4.3 Example2: Usingstate-holdingelements 98
6.4.4 Themonotoniccoverconstraint 98
6.4.5 Circuittopologiesusingstate-holdingelements 99
6.5 Initialization 101
6.6 Summaryofthesynthesisprocess 101
6.7 Petrify: AtoolforsynthesizingSIcircuitsfromSTGs 102
6.8 DesignexamplesusingPetrify 104
6.8.1 Example2revisited 104
6.8.2 Controlcircuitfora4-phasebundled-datalatch 106
6.8.3 Controlcircuitfora4-phasebundled-dataMUX 109
6.9 Summary 113
7
Advanced4-phasebundled-data 115
protocolsandcircuits
7.1 Channelsandprotocols 115
7.1.1 Channeltypes 115
7.1.2 Data-validityschemes 116
7.1.3 Discussion 116
7.2 Statictypechecking 118
7.3 Moreadvancedlatchcontrolcircuits 119
7.4 Summary 121
8
High-levellanguagesandtools 123
8.1 Introduction 123
8.2 ConcurrencyandmessagepassinginCSP 124
8.3 Tangram: programexamples 126
8.3.1 A2-placeshiftregister 126
8.3.2 A2-place(ripple)FIFO 126
viii PRINCIPLESOFASYNCHRONOUSCIRCUITDESIGN
8.3.3 GCDusingwhileandifstatements 127
8.3.4 GCDusingguardedcommands 128
8.4 Tangram: syntax-directedcompilation 128
8.4.1 The2-placeshiftregister 129
8.4.2 The2-placeFIFO 130
8.4.3 GCDusingguardedrepetition 131
8.5 Martin’stranslationprocess 133
8.6 UsingVHDLforasynchronousdesign 134
8.6.1 Introduction 134
8.6.2 VHDLversusCSP-typelanguages 135
8.6.3 Channelcommunicationanddesignflow 136
8.6.4 Theabstractchannelpackage 138
8.6.5 Therealchannelpackage 142
8.6.6 Partitioningintocontrolanddata 144
8.7 Summary 146
Appendix: TheVHDLchannelpackages 148
A.1 Theabstractchannelpackage 148
A.2 Therealchannelpackage 150
PartII Balsa-AnAsynchronousHardwareSynthesisSystem
Author: DougEdwards,AndrewBardsley
9
AnintroductiontoBalsa 155
9.1 Overview 155
9.2 Basicconcepts 156
9.3 Toolsetanddesignflow 159
9.4 Gettingstarted 159
9.4.1 Asingle-placebuffer 161
9.4.2 Two-placebuffers 163
9.4.3 Parallelcompositionandmodulereuse 164
9.4.4 Placingmultiplestructures 165
9.5 AncillaryBalsatools 166
9.5.1 Makefilegeneration 166
9.5.2 Estimatingareacost 167
9.5.3 Viewingthehandshakecircuitgraph 168
9.5.4 Simulation 168
10
TheBalsalanguage 173
10.1 Datatypes 173
10.2 Datatypingissues 176
10.3 Controlflowandcommands 178
10.4 Binary/unaryoperators 181
10.5 Programstructure 181
10.6 Examplecircuits 183
10.7 Selectingchannels 190
Contents ix
11
Buildinglibrarycomponents 193
11.1 Parameteriseddescriptions 193
11.1.1 Avariablewidthbufferdefinition 193
11.1.2 Pipelinesofvariablewidthanddepth 194
11.2 Recursivedefinitions 195
11.2.1 Ann-waymultiplexer 195
11.2.2 Apopulationcounter 197
11.2.3 ABalsashifter 200
11.2.4 Anarbitertree 202
12
AsimpleDMAcontroller 205
12.1 Globalregisters 205
12.2 Channelregisters 206
12.3 DMAcontrollerstructure 207
12.4 TheBalsadescription 211
12.4.1 Arbitertree 211
12.4.2 Transferengine 212
12.4.3 Controlunit 213
PartIII Large-ScaleAsynchronousDesigns
13
Descale 221
JoepKessels&AdPeeters,TorstenKramerandVolkerTimm
13.1 Introduction 222
13.2 VLSIprogrammingofasynchronouscircuits 223
13.2.1 TheTangramtoolset 223
13.2.2 Handshaketechnology 225
13.2.3 GCDalgorithm 226
13.3 Opportunitiesforasynchronouscircuits 231
13.4 Contactlesssmartcards 232
13.5 Thedigitalcircuit 235
13.5.1 The80C51microcontroller 236
13.5.2 Theprefetchunit 239
13.5.3 TheDEScoprocessor 241
13.6 Results 243
13.7 Test 245
13.8 Thepowersupplyunit 246
13.9 Conclusions 247
14
AnAsynchronousViterbiDecoder 249
LindaE.M.Brackenbury
14.1 Introduction 249
14.2 TheViterbidecoder 250
14.2.1 Convolutionencoding 250
14.2.2 Decoderprinciple 251
14.3 Systemparameters 253
14.4 Systemoverview 254
x PRINCIPLESOFASYNCHRONOUSCIRCUITDESIGN
14.5 ThePathMetricUnit(PMU) 256
14.5.1 NodepairdesigninthePMU 256
14.5.2 Branchmetrics 259
14.5.3 Slottiming 261
14.5.4 Globalwinneridentification 262
14.6 TheHistoryUnit(HU) 264
14.6.1 Principleofoperation 264
14.6.2 HistoryUnitbacktrace 264
14.6.3 HistoryUnitimplementation 267
14.7 Resultsanddesignevaluation 269
14.8 Conclusions 271
14.8.1 Acknowledgement 272
14.8.2 Furtherreading 272
15
Processors 273
JimD.Garside
15.1 AnintroductiontotheAmuletprocessors 274
15.1.1 Amulet1(1994) 274
15.1.2 Amulet2e(1996) 275
15.1.3 Amulet3i(2000) 275
15.2 Someotherasynchronousmicroprocessors 276
15.3 Processorsasdesignexamples 278
15.4 Processorimplementationtechniques 279
15.4.1 Pipeliningprocessors 279
15.4.2 Asynchronouspipelinearchitectures 281
15.4.3 Determinismandnon-determinism 282
15.4.4 Dependencies 288
15.4.5 Exceptions 297
15.5 Memory–acasestudy 302
15.5.1 Sequentialaccesses 302
15.5.2 TheAmulet3iRAM 303
15.5.3 Cache 307
15.6 Largerasynchronoussystems 310
15.6.1 System-on-Chip(DRACO) 310
15.6.2 Interconnection 310
15.6.3 BalsaandtheDMAcontroller 312
15.6.4 Calibratedtimedelays 313
15.6.5 Productiontest 314
15.7 Summary 315
Epilogue 317
References 319
Index 333
Preface
Thisbookwascompiledtoaddressaperceivedneedforanintroductorytext
onasynchronousdesign. Thereareseveralhighlytechnicalbooksonaspectsof
thesubject,butnoobvious startingpointforadesigner whowishestobecome
acquaintedforthefirsttimewithasynchronous technology. Wehopethisbook
willserveasthatstarting point.
The reader is assumed to have some background in digital design. We as-
sumethat concepts such aslogic gates, flip-flopsand Boolean logic are famil-
iar. Someofthelattersectionsalsoassumefamiliaritywiththehigherlevelsof
digital design such as microprocessor architectures and systems-on-chip, but
readers unfamiliar with these topics should still find the majority of the book
accessible.
Theintended audience forthebookcomprises thefollowing groups:
Industrial designers withabackground inconventional (clocked) digital
design who wish to gain an understanding of asynchronous design in
order, for example, to establish whether or not it may be advantageous
touseasynchronous techniques intheirnextdesign task.
Students in Electronic and/or Computer Engineering who are taking a
course thatincludes aspects ofasynchronous design.
The book is structured in three parts. Part I is a tutorial in asynchronous
design. Itaddresses themostimportant issueforthebeginner,whichishowto
think about asynchronous systems. Thefirstbig hurdle to be cleared isthat of
mindset –asynchronous design requires adifferent mental approach fromthat
normally employed in clocked design. Attempts to take an existing clocked
system,stripouttheclockandsimplyreplaceitwithasynchronoushandshakes
are doomed to disappoint. Another hurdle is that of circuit design methodol-
ogy–theexisting body ofliterature presents anapparent plethora ofdisparate
approaches. Theaimofthetutorial istogetbehindthisandtopresentasingle
unified and coherent perspective which emphasizes the common ground. In
this way the tutorial should enable the reader to begin to understand the char-
acteristics of asynchronous systems in a way that will enable them to ‘think
xi
xii PRINCIPLESOFASYNCHRONOUSCIRCUITDESIGN
outside the box’ of conventional clocked design and to create radical new de-
signsolutions thatfully exploit thepotential ofclockless systems.
Once the asynchronous design mindset has been mastered, the second hur-
dle is designer productivity. VLSI designers are used to working in a highly
productiveenvironmentsupportedbypowerfulautomatictools. Asynchronous
design lags in its tools environment, but things are improving. Part II of the
book gives an introduction to Balsa, a high-level synthesis system for asyn-
chronous circuits. Itiswritten byDougEdwards(whohasmanaged theBalsa
development at the University of Manchester since its inception) and Andrew
Bardsley(whohaswrittenmostofthesoftware). Balsaisnotthesolutiontoall
asynchronous design problems, but itiscapable of synthesizing very complex
systems (for example, the 32-channel DMA controller used on the DRACO
chipdescribedinChapter15)anditisagoodwaytodevelopanunderstanding
ofasynchronous design ‘inthelarge’.
Knowinghowtothinkaboutasynchronous designandhavingaccesstosuit-
able tools leaves one question: what can be built in this way? In Part III we
offer a number of examples of complex asynchronous systems as illustrations
of the answer to this question. In each of these examples the designers have
been asked to provide descriptions that will provide the reader with insights
into the design process. The examples include a commercial smart card chip
designedatPhilipsandaViterbidecoderdesignedattheUniversityofManch-
ester. PartIII closes withadiscussion ofthe issues that come up inthe design
ofadvancedasynchronous microprocessors, focusingontheAmuletprocessor
series,again developed attheUniversity ofManchester.
Although thebook isacompilation of contributions from different authors,
eachofthesehasbeenspecificallywrittenwiththegoalsofthebookinmind–
to provide answers to the sorts of questions that a newcomer to asynchronous
design is likely to ask. In order to keep the book accessible and to avoid it
becominganintimidatingsize,muchvaluableworkhashadtobeomitted. Our
objective inintroducing youtoasynchronous design isthatyoumightbecome
acquainted withit. Ifyour relationship develops further, perhaps eveninto the
full-blownaffairthathassmittenafew,includedamongwhosenumberarethe
contributors to this book, you will, of course, want to know more. The book
includes an extensive bibliography that will provide food enough for even the
mostinsatiable ofappetites.
JENSSPARSØANDSTEVEFURBER,SEPTEMBER2001
xiii
Acknowledgments
Manypeople havehelped significantly inthecreation ofthisbook. Inaddi-
tion to writing their respective chapters, several of the authors have also read
andcommentedondraftsofotherpartsofthebook,andthequalityofthework
asawholehasbeenenhanced asaresult.
The editors are also grateful to Alan Williams, Russell Hobson and Steve
Temple, for their careful reading of drafts of this book and their constructive
suggestions forimprovement.
Part I of the book has been used as a course text and the quality and con-
sistency of the content improved by feedback from the students on the spring
2001course “49425 DesignofAsynchronous Circuits” atDTU.
Anyremaining errorsoromissions aretheresponsibility oftheeditors.
The writing of this book was initiated as a dissemination effort within the
EuropeanLow-PowerInitiativeforElectronicSystemDesign(ESD-LPD),and
this book is part of the book series from this initiative. As will become clear,
the book goes far beyond the dissemination of results from projects within
in the ESD-LPD cluster, and the editors would like to acknowledge the sup-
portoftheworkinggrouponasynchronous circuitdesign,ACiD-WG,thathas
providedafruitfulforumforinteractionandtheexchangeofideas. TheACiD-
WG has been funded by the European Commission since 1992 under several
Framework Programmes: FP3 Basic Research (EP7225), FP4 Technologies
for Components and Subsystems (EP21949), and FP5 Microelectronics (IST-
1999-29119).