Table Of ContentCo-design for System Acceleration
A Quantitative Approach
CO-DESIGN FOR SYSTEM
ACCELERATION
A Quantitative Approach
NADIANEDJAH
DepartmentofElectronicsEngineeringandTelecommunications,
StateUniversityofRiodeJaneiro,Brazil
LUIZADEMACEDOMOURELLE
DepartmentofSystemsEngineeringandComputation,
StateUniversityofRiodeJaneiro,Brazil
AC.I.P.CataloguerecordforthisbookisavailablefromtheLibraryofCongress.
ISBN-13978-1-4020-5545-4(HB)
ISBN-13978-1-4020-5546-1(e-book)
PublishedbySpringer,
P.O.Box17,3300AADordrecht,TheNetherlands.
www.springer.com
Printedonacid-freepaper
AllRightsReserved
(cid:1)c 2007Springer
Nopartofthisworkmaybereproduced, storedinaretrievalsystem, ortransmittedinanyformorby
anymeans,electronic,mechanical,photocopying,microfilming,recordingorotherwise,withoutwritten
permissionfromthePublisher,withtheexceptionofanymaterialsuppliedspecificallyforthepurposeof
beingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthework.
Tomymotherandsisters,
Nadia
Tomyfather(inmemory)
andmother,Luiza
Contents
Dedication v
ListofFigures xi
ListofTables xv
Preface xvii
Acknowledgments xix
1. INTRODUCTION 1
1.1 Synthesis 2
1.2 DesignApproaches 3
1.3 Co-Design 4
1.3.1 Methodology 5
1.3.2 Simulation 6
1.3.3 Architecture 6
1.3.4 Communication 6
1.4 StructureandObjective 7
2. THECO-DESIGNMETHODOLOGY 9
2.1 TheCo-DesignApproach 10
2.2 SystemSpecification 11
2.3 Hardware/SoftwarePartitioning 12
2.4 HardwareSynthesis 15
2.4.1 High-LevelSynthesis 16
2.4.2 ImplementationTechnologies 17
2.4.3 SynthesisSystems 20
2.5 SoftwareCompilation 21
2.6 InterfaceSynthesis 22
vii
viii Contents
2.7 SystemIntegration 23
2.8 Summary 27
3. THECO-DESIGNSYSTEM 29
3.1 DevelopmentRoute 30
3.1.1 Hardware/SoftwareProfiling 31
3.1.2 Hardware/SoftwarePartitioning 33
3.1.3 HardwareSynthesis 33
3.1.4 SoftwareCompilation 36
3.1.5 Run-TimeSystem 36
3.2 TargetArchitecture 37
3.2.1 Microcontroller 38
3.2.2 GlobalMemory 40
3.2.3 Controllers 41
3.2.4 BusInterface 42
3.2.5 TheCoprocessor 45
3.2.6 TheTimer 45
3.3 PerformanceResults 45
3.3.1 FirstBenchmark: PLUMProgram 46
3.3.2 SecondBenchmark: EGCHECKProgram 47
3.3.3 ResultsAnalysis 48
3.4 Summary 50
4. VHDLMODELOFTHECO-DESIGNSYSTEM 53
4.1 ModellingwithVHDL 54
4.1.1 DesignUnitsandLibraries 55
4.1.2 EntitiesandArchitectures 55
4.1.3 Hierarchy 57
4.2 TheMainSystem 58
4.3 TheMicrocontroller 60
4.3.1 ClockandResetGenerator 61
4.3.2 Sequencer 61
4.3.3 BusArbiter 65
4.3.4 MemoryReadandWrite 68
4.4 TheDynamicMemory: DRAM 72
4.5 TheCoprocessor 74
4.5.1 ClockGenerator 75
4.5.2 CoprocessorDataBuffers 76
4.6 Summary 77
Contents ix
5. SHAREDMEMORYCONFIGURATION 81
5.1 CaseStudy 82
5.2 TimingCharacteristics 85
5.2.1 ParameterPassing 87
5.2.2 BusArbitration 87
5.2.3 Busy-WaitMechanism 88
5.2.4 InterruptMechanism 90
5.3 RelatingMemoryAccessesandInterfaceMechanisms 92
5.3.1 VaryingInternalOperationsand
MemoryAccesses 94
5.3.2 VaryingtheCoprocessorMemoryAccessRate 96
5.3.3 VaryingtheNumberofCoprocessor
MemoryAccesses 98
5.4 Summary 105
6. DUAL-PORTMEMORYCONFIGURATION 107
6.1 GeneralDescription 108
6.1.1 ContentionArbitration 108
6.1.2 Read/WriteOperations 110
6.2 TheSystemArchitecture 111
6.2.1 Dual-PortMemoryModel 111
6.2.2 TheCoprocessor 113
6.2.3 BusInterfaceController 113
6.2.4 CoprocessorMemoryController 114
6.2.5 TheMainController 117
6.3 TimingCharacteristics 118
6.3.1 InterfaceMechanisms 120
6.4 PerformanceResults 121
6.4.1 VaryingInternalOperationsand
MemoryAccesses 121
6.4.2 VaryingtheMemoryAccessRate 126
6.4.3 VaryingtheNumberofMemoryAccesses 127
6.4.4 SpeedupAchieved 129
6.5 Summary 130
7. CACHEMEMORYCONFIGURATION 133
7.1 MemoryHierarchyDesign 134
7.1.1 GeneralPrinciples 134
7.1.2 CacheMemory 135
x Contents
7.2 SystemOrganization 138
7.2.1 CacheMemoryModel 138
7.2.2 TheCoprocessor 139
7.2.3 CoprocessorMemoryController 140
7.2.4 TheBusInterfaceController 145
7.3 TimingCharacteristics 150
7.3.1 BlockTransferDuringHandshakeCompletion 155
7.4 PerformanceResults 158
7.4.1 VaryingtheNumberofAddressedLocations 159
7.4.2 VaryingtheBlockSize 162
7.4.3 VaryingtheNumberofMemoryAccesses 164
7.4.4 SpeedupAchieved 166
7.4.5 MissRatewithRandomAddressLocations 167
7.5 Summary 169
8. ADVANCEDTOPICSANDFURTHERRESEARCH 173
8.1 ConclusionsandAchievements 173
8.2 AdvancedTopicsandFurtherResearch 176
8.2.1 CompleteVHDLModel 176
8.2.2 CostEvaluation 177
8.2.3 NewConfigurations 177
8.2.4 InterfaceSynthesis 177
8.2.5 ArchitectureSynthesis 177
8.2.6 Frameworkforco-design 177
8.2.7 GeneralFormalization 178
Appendices 185
A BenchmarkPrograms 185
B Top-LevelVHDLModeloftheCo-designSystem 191
C TranslatingPALASMTM intoVHDL 199
D VHDLVersionoftheCaseStudy 205
References 219
Index 225
List of Figures
2.1 Theco-designflow 11
2.2 TypicalCLBconnectionstoadjacentlines 19
2.3 IntelFLEXlogiciFX780configuration 21
2.4 Targetarchitecturewithparametermemory
inthecoprocessor 24
2.5 Targetarchitecturewithmemory-mapped
parameterregisters 25
2.6 Targetarchitectureusingageneral-purpose
processorandASICs 26
3.1 Developmentroute 31
3.2 Hardwaresynthesisprocess 34
3.3 Run-timesystem 37
3.4 Run-timesystemandinterfaces 38
3.5 Targetarchitecture 39
3.6 Businterfacecontrolregister 43
4.1 Mainsystemconfiguration 59
4.2 Coprocessorboardcomponents 60
4.3 Logicsymbolforthemicrocontroller 60
4.4 VHDLmodelfortheclockandresetgenerator 62
4.5 Writingintothecoprocessorcontrolregister 63
4.6 Thebusy-waitmodel 64
4.7 Theinterruptroutinemodel 65
4.8 Completingthehandshake,bynegatingNcopro st 66
4.9 Algorithmicstatemachineforthebusarbiter 67
4.10 Algorithmicstatemachineformemoryread/write 70
xi