Table Of ContentAcceleration of Biomedical Image
Processing with Dataflow
on FPGAs
RIVER PUBLISHERS SERIES IN INFORMATION
SCIENCEAND TECHNOLOGY
Volume22
SeriesEditors
K.C.CHEN SANDEEPSHUKLA
NationalTaiwanUniversity VirginiaTech
Taipei,Taiwan USA
CHRISTOPHEBOBDA
UniversityofArkansas
USA
The “River Publishers Series in Information Science and Technology” covers research which
ushers the 21st Century into an Internet and multimedia era. Multimedia means the theory
and application of filtering, coding, estimating, analyzing, detecting and recognizing, synthe-
sizing, classifying, recording, and reproducing signals by digital and/or analog devices or
techniques, while the scope of “signal” includes audio, video, speech, image, musical, multi-
media,data/content,geophysical,sonar/radar,bio/medical,sensation,etc.Networkingsuggests
transportation of such multimedia contents among nodes in communication and/or computer
networks,tofacilitatetheultimateInternet.
Theory,technologies,protocolsandstandards,applications/services,practiceandimplemen-
tationofwired/wirelessnetworkingareallwithinthescopeofthisseries.Basedonnetworkand
communication science, we further extend the scope for 21st Century life through the knowl-
edge in robotics, machine learning, embedded systems, cognitive science, pattern recognition,
quantum/biological/molecularcomputationandinformationprocessing,biology,ecology,social
science and economics, user behaviors and interface, and applications to health and society
advance.
Bookspublishedintheseriesincluderesearchmonographs,editedvolumes,handbooksand
textbooks.Thebooksprovideprofessionals,researchers,educators,andadvancedstudentsinthe
fieldwithaninvaluableinsightintothelatestresearchanddevelopments.
Topicscoveredintheseriesinclude,butarebynomeansrestrictedtothefollowing:
• Communication/ComputerNetworkingTechnologiesandApplications
• QueuingTheory
• Optimization
• OperationResearch
• StochasticProcesses
• InformationTheory
• Multimedia/Speech/VideoProcessing
• ComputationandInformationProcessing
• MachineIntelligence
• CognitiveScienceandBrianScience
• EmbeddedSystems
• ComputerArchitectures
• ReconfigurableComputing
• CyberSecurity
Foralistofotherbooksinthisseries,visitwww.riverpublishers.com
The NEC and You Perfect Together:
Acceleration of Biomedical Image
A ComprehPenrosicvee sSstuindyg owf itthhe Dataflow
National Electrical Codeo n FPGAs
Frederik Grüll
GoetheUniversityFrankfurt,
Gregory P. Bierals Germany
Electrical Design Institute, USA
Udo Kebschull
GoetheUniversityFrankfurt,
Germany
River Publishers
Published2016byRiverPublishers
RiverPublishers
Alsbjergvej10,9260Gistrup,Denmark
www.riverpublishers.com
DistributedexclusivelybyRoutledge
4ParkSquare,MiltonPark,Abingdon,OxonOX144RN
605ThirdAvenue,NewYork,NY10017,USA
Acceleration of Biomedical Image Processing with Dataflow on FPGAs/by
FrederikGrüll,UdoKebschull.
©2016RiverPublishers.Allrightsreserved.Nopartofthispublicationmay
bereproduced,storedinaretrievalsystems,ortransmittedinanyformorby
anymeans,mechanical,photocopying,recordingorotherwise,withoutprior
writtenpermissionofthepublishers.
RoutledgeisanimprintoftheTaylor&FrancisGroup,aninforma
business
ISBN978-87-93379-36-7(print)
While every effort is made to provide dependable information, the
publisher, authors, and editors cannot be held responsible for any errors
oromissions.
Contents
Foreword xi
Preface xiii
Acknowledgments xv
ListofFigures xvii
ListofTables xxi
ListofListings xxiii
ListofAbbreviations xxv
1 Introduction 1
1.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 TheIdea . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 AimofthisBook . . . . . . . . . . . . . . . . . . . 3
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 DataflowComputing 7
2.1 EarlyApproaches . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 ControlFlowandDataflow . . . . . . . . . . . . . . 7
2.1.2 DataflowMachines . . . . . . . . . . . . . . . . . . 9
2.1.3 DataflowPrograms . . . . . . . . . . . . . . . . . . 11
2.2 PrinciplesofDataflowComputingonReconfigurable
Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Scheduling . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2.1 Dynamicscheduling . . . . . . . . . . . . 15
2.2.2.2 Staticscheduling . . . . . . . . . . . . . . 15
v
vi Contents
2.2.2.3 Combinedforms . . . . . . . . . . . . . . 16
2.2.3 ImageProcessing . . . . . . . . . . . . . . . . . . . 16
2.2.3.1 Pointoperations . . . . . . . . . . . . . . 16
2.2.3.2 Convolutions. . . . . . . . . . . . . . . . 17
2.2.3.3 Reductions . . . . . . . . . . . . . . . . . 18
2.2.3.4 Operationswithnon-linearaccess
patterns . . . . . . . . . . . . . . . . . . 18
2.3 FPGAHardware. . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 IntegratedCircuits . . . . . . . . . . . . . . . . . . 19
2.3.1.1 Configurablelogicblocks . . . . . . . . . 20
2.3.1.2 BlockRAM . . . . . . . . . . . . . . . . 21
2.3.1.3 Digitalsignalprocessors. . . . . . . . . . 22
2.3.2 Low-LevelHardwareDescriptionLanguages . . . . 23
2.3.2.1 VHDLandVerilog . . . . . . . . . . . . . 23
2.3.2.2 FPGAdesignflow . . . . . . . . . . . . . 25
2.3.3 FPGAsasApplicationAccelerators . . . . . . . . . 26
2.3.3.1 Pipelining . . . . . . . . . . . . . . . . . 28
2.3.3.2 Flynn’staxonomy . . . . . . . . . . . . . 30
2.3.3.3 Limitsofacceleration . . . . . . . . . . . 31
2.4 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 ImperativeLanguages . . . . . . . . . . . . . . . . 34
2.4.1.1 Handel-C. . . . . . . . . . . . . . . . . . 34
2.4.1.2 XilinxVivadohigh-levelsynthesis . . . . 35
2.4.1.3 ROCCC2.0 . . . . . . . . . . . . . . . . 36
2.4.2 StreamLanguages . . . . . . . . . . . . . . . . . . 37
2.4.2.1 MaxCompiler . . . . . . . . . . . . . . . 38
2.4.2.2 SiliconSoftwareVisualApplets . . . . . . 40
3 AccelerationofImperativeCodewithDataflowComputing 43
3.1 RelationtoListProcessing . . . . . . . . . . . . . . . . . . 43
3.1.1 BasicFunctions . . . . . . . . . . . . . . . . . . . . 46
3.1.2 Transformations . . . . . . . . . . . . . . . . . . . 47
3.1.2.1 Nestedlists. . . . . . . . . . . . . . . . . 48
3.1.3 Reductions . . . . . . . . . . . . . . . . . . . . . . 49
3.1.4 Generation . . . . . . . . . . . . . . . . . . . . . . 50
3.1.5 Sublists . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.6 Searching . . . . . . . . . . . . . . . . . . . . . . . 53
3.1.6.1 Indexinglists . . . . . . . . . . . . . . . 54
3.1.7 ZippingandUnzipping . . . . . . . . . . . . . . . . 54
Contents vii
3.1.8 SetOperations . . . . . . . . . . . . . . . . . . . . 55
3.1.9 OrderedLists . . . . . . . . . . . . . . . . . . . . . 56
3.1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 IdentificationofThroughputBoundaries . . . . . . . . . . . 57
3.2.1 ProfilinginSoftware . . . . . . . . . . . . . . . . . 59
3.2.2 ProfilingtheCPUSystem . . . . . . . . . . . . . . 61
3.2.3 ProfilingDataflowDesigns . . . . . . . . . . . . . . 62
3.3 PipeliningImperativeControlFlows . . . . . . . . . . . . . 63
3.3.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.2 Conditionals . . . . . . . . . . . . . . . . . . . . . 68
3.3.3 Loops . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.3.1 Loopunrolling . . . . . . . . . . . . . . . 72
3.3.3.2 Loopparallelization . . . . . . . . . . . . 73
3.3.3.3 Loopcascading . . . . . . . . . . . . . . 76
3.3.3.4 Looptiling . . . . . . . . . . . . . . . . . 79
3.3.3.5 Loopinterweaving . . . . . . . . . . . . . 83
3.3.3.6 Finite-statemachines . . . . . . . . . . . 83
3.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . 84
3.4 EfficientBitandNumberManipulations . . . . . . . . . . . 85
3.4.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . 85
3.4.1.1 Integersandfixed-pointrepresentations . . 86
3.4.1.2 Floating-pointrepresentations . . . . . . . 88
3.4.1.3 Alternativeencodings . . . . . . . . . . . 90
3.4.2 Dimensioning . . . . . . . . . . . . . . . . . . . . . 91
3.4.2.1 Range . . . . . . . . . . . . . . . . . . . 92
3.4.2.2 Precision . . . . . . . . . . . . . . . . . . 93
3.5 CustomizingMemoryAccess . . . . . . . . . . . . . . . . . 95
3.5.1 MemoryLayoutandAccessPatterns . . . . . . . . . 97
3.5.2 On-ChipMemory . . . . . . . . . . . . . . . . . . . 97
3.5.3 Off-ChipMemory . . . . . . . . . . . . . . . . . . 98
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4 BiomedicalImageProcessingandReconstruction 101
4.1 LocalizationMicroscopy . . . . . . . . . . . . . . . . . . . 101
4.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.2 PhysicalPrinciples . . . . . . . . . . . . . . . . . . 105
4.1.3 LocalizationAlgorithms . . . . . . . . . . . . . . . 108
4.1.4 BackgroundRemoval. . . . . . . . . . . . . . . . . 109
4.1.5 SpotDetection . . . . . . . . . . . . . . . . . . . . 112
viii Contents
4.1.6 FeatureExtraction . . . . . . . . . . . . . . . . . . 114
4.1.7 Super-ResolutionImageGeneration . . . . . . . . . 119
4.1.8 StateoftheArt . . . . . . . . . . . . . . . . . . . . 120
4.1.9 AnalysisoftheAlgorithm . . . . . . . . . . . . . . 122
4.1.9.1 Methods . . . . . . . . . . . . . . . . . . 123
4.1.9.2 Dataflow . . . . . . . . . . . . . . . . . . 125
4.1.9.3 Dimensioningofthehardware . . . . . . 127
4.1.10 Implementation . . . . . . . . . . . . . . . . . . . . 129
4.1.10.1 Hostcode . . . . . . . . . . . . . . . . . 130
4.1.10.2 Backgroundremoval . . . . . . . . . . . 130
4.1.10.3 Spotdetection . . . . . . . . . . . . . . . 131
4.1.10.4 Spotseparation . . . . . . . . . . . . . . 133
4.1.10.5 Featureextraction . . . . . . . . . . . . . 134
4.1.10.6 Visualization . . . . . . . . . . . . . . . . 135
4.1.11 Results . . . . . . . . . . . . . . . . . . . . . . . . 136
4.1.11.1 Accuracy . . . . . . . . . . . . . . . . . . 136
4.1.11.2 Throughput . . . . . . . . . . . . . . . . 141
4.1.11.3 Resourceusage . . . . . . . . . . . . . . 143
4.1.12 Discussion . . . . . . . . . . . . . . . . . . . . . . 144
4.2 3DElectronTomography . . . . . . . . . . . . . . . . . . . 145
4.2.1 ReconstructionAlgorithms . . . . . . . . . . . . . . 147
4.2.2 StateoftheArt . . . . . . . . . . . . . . . . . . . . 149
4.2.3 AnalysisoftheAlgorithm . . . . . . . . . . . . . . 151
4.2.3.1 Modifications . . . . . . . . . . . . . . . 151
4.2.3.2 Dataflow . . . . . . . . . . . . . . . . . . 155
4.2.3.3 Dimensioningofthehardware . . . . . . 157
4.2.4 Implementation . . . . . . . . . . . . . . . . . . . . 160
4.2.4.1 Scheduling . . . . . . . . . . . . . . . . . 161
4.2.4.2 ExternalDRAM . . . . . . . . . . . . . . 164
4.2.4.3 Ray–Boxintersection . . . . . . . . . . . 165
4.2.4.4 Projectionaccumulator . . . . . . . . . . 165
4.2.4.5 Residuesstorage . . . . . . . . . . . . . . 169
4.2.4.6 Multi-piping . . . . . . . . . . . . . . . . 170
4.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . 171
4.2.5.1 Accuracy . . . . . . . . . . . . . . . . . . 172
4.2.5.2 Throughput . . . . . . . . . . . . . . . . 173
4.2.5.3 Resourceusage . . . . . . . . . . . . . . 175
4.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . 176
Contents ix
5 Conclusion 179
5.1 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.2 High-LevelDevelopment . . . . . . . . . . . . . . . . . . . 180
5.3 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.4 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
References 185
Index 197
AbouttheAuthors 199