ebook img

Approximate Arithmetic Circuit Architectures for FPGA-based Systems PDF

189 Pages·2023·11.146 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Approximate Arithmetic Circuit Architectures for FPGA-based Systems

Approximate Arithmetic Circuit Architectures for FPGA-based Systems Salim Ullah • Akash Kumar Approximate Arithmetic Circuit Architectures for FPGA-based Systems SalimUllah AkashKumar TUDresden TUDresden Dresden,Germany Dresden,Germany ISBN978-3-031-21293-2 ISBN978-3-031-21294-9 (eBook) https://doi.org/10.1007/978-3-031-21294-9 ©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerland AG2023 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuse ofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Dedicatedtoourfamiliesandallthosewho alwaysgiveitonemoretry Preface From the initial computing machines, Colossus of 1943 and ENIAC of 1945, to modern high-performance data centers and Internet of Things (IOTs), four design goals, i.e., high-performance, energy-efficiency, resource utilization, and easeofprogrammability,haveremainedabeaconofdevelopmentforthecomputing industry. During this period, the computing industry has exploited the advantages oftechnologyscalingandmicroarchitecturalenhancementstoachievethesegoals. However, with the end of Dennard scaling, these techniques have diminishing energy and performance advantages. Therefore, it is necessary to explore alter- native techniques for satisfying the computational and energy requirements of modern applications. Towards this end, one promising technique is analyzing and surrendering the strict notion of correctness in various layers of the computation stack.Mostmodernapplicationsacrossthecomputingspectrum—fromdatacenters to IoTs—interact and analyze real-world data and take decisions accordingly. These applications are broadly classified as Recognition, Mining, and Synthesis (RMS). Instead of producing a single golden answer, these applications produce severalfeasibleanswers.Theseapplicationspossessaninherenterror-resilienceto the inexactness of processed data and corresponding operations. Utilizing these applications’ inherent error-resilience, the paradigm of approximate computing relaxesthestrictnotionofcomputationcorrectnesstorealizehigh-performanceand energy-efficientsystemswithacceptablequalityoutputs. The prior works on circuit-level approximations have mainly focused on Application-specific Integrated Circuits (ASICs). However, Application-specific IntegratedCircuit(ASIC)-basedsolutionssufferfromlongtime-to-marketandhigh- costdevelopingcycles.TheselimitationsofASICscanbeovercomebyutilizingthe reconfigurablenatureofFieldProgrammableGateArrays(FPGAs).However,due to architectural differences between ASICs and Field Programmable Gate Arrays (FPGAs),theutilizationofASIC-basedapproximationtechniquesforFPGA-based systems does not result in proportional performance and energy gains. Therefore, to exploit the principles of approximate computing for FPGA-based hardware accelerators for error-resilient applications, FPGA-optimized approximation techniques are required. Further, most state-of-the-art approximate arithmetic vii viii Preface operators do not have a generic approximation methodology to implement new approximate designs for an application’s changing accuracy and performance requirements. These works also lack a methodology where a machine learning model can be used to correlate an approximate operator with its impact on the output quality of an application. This book focuses on these research challenges by designing and exploring FPGA-optimized logic-based approximate arithmetic operators. As multiplication operation is one of the computationally complex and mostfrequentlyusedarithmeticoperationsinvariousmodernapplications,suchas Artificial Neural Networks (ANNs), we have, therefore, considered it for most of theproposedapproximationtechniquesinthisbook. The primary focus of the work is to provide a framework for generating FPGA-optimized approximate arithmetic operators and efficient techniques to explore approximate operators for implementing hardware accelerators for error- resilient applications. Towards this end, we first present various designs of resource-optimized, high-performance, and energy-efficient accurate multipliers. Although modern FPGAs host high-performance Digital Signal Processing (DSP) blocks to perform multiplication and other arithmetic operations, our analysis and results show that the orthogonal approach of having resource-efficient and high-performance multipliers is necessary for implementing high-performance accelerators. Due to the differences in the type of data processed by various applications,thebookpresentsindividualdesignsforunsigned,signed,andconstant multipliers. Compared to the multiplier IPs provided by the FPGA Synthesis tool, our proposed designs provide significant performance gains. We then explore the designedaccuratemultipliersandprovidealibraryofapproximateunsigned/signed multipliers. The proposed approximations target the reduction in the total utilized resources,criticalpathdelay, andenergy consumption ofthemultipliers.Wehave exploredvariousstatisticalerrormetricstocharacterizetheapproximation-induced accuracy degradation of the approximate multipliers. We have also utilized the designed multipliers in various error-resilient applications to evaluate their impact onapplications’outputqualityandperformance. Based on our analysis of the designed approximate multipliers, we identify the needforaframeworktodesignapplication-specificapproximatearithmeticopera- tors.Anapplication-specificapproximatearithmeticoperatorintendstoimplement onlythelogicthatcansatisfytheapplication’soveralloutputaccuracyandperfor- manceconstraints.Towardsthisend,wepresentagenericdesignmethodologyfor implementing FPGA-based application-specific approximate arithmetic operators from their accurate implementations according to the applications’ accuracy and performance requirements. In this regard, we utilize various machine learning models to identify feasible approximate arithmetic configurations for various applications. We also utilize different machine learning models and optimization techniquestoefficientlyexplorethelargedesignspaceofindividualoperatorsand their utilization in various applications. In this book, we have used the proposed methodologytodesignapproximateaddersandmultipliers. This book also explores other layers of the computation stack (cross-layer) for possible approximations to satisfy an application’s accuracy and performance Preface ix requirements. Towards this end, we present a framework to allow the intelligent exploration and highly accurate identification of the feasible design points in the largedesignspaceenabledbycross-layerapproximations.Theproposedframework utilizes a novel Polynomial Regression (PR)-based method to model approximate arithmeticoperators.ThePR-basedrepresentationenablesmachinelearningmodels to better correlate an approximate operator’s coefficients with their impact on an application’soutputquality. Dresden,Germany SalimUllah Dresden,Germany AkashKumar September2022 Acknowledgments We would like to thank our group members and collaborators, especially Dr. Siva SatyendraSahoo,fortheircontinuedsupportinrealizingthiswork. xi Contents 1 Introduction .................................................................. 1 1.1 Introduction............................................................ 1 1.2 InherentErrorResilienceofApplications............................ 3 1.3 ApproximateComputingParadigm................................... 5 1.3.1 Error-ResilientComputing.................................... 5 1.3.2 StochasticComputing......................................... 6 1.3.3 ApproximateComputing...................................... 6 1.3.4 SoftwareLayerApproximation............................... 7 1.3.5 ArchitectureLayerApproximation........................... 8 1.3.6 CircuitLayerApproximation................................. 10 1.4 ProblemStatement..................................................... 15 1.4.1 ResearchChallenge ........................................... 16 1.5 FocusoftheBook ..................................................... 17 1.6 KeyContributionsandBookOverview .............................. 20 References..................................................................... 21 2 Preliminaries ................................................................. 27 2.1 Introduction............................................................ 27 2.2 XilinxFPGASliceStructure.......................................... 27 2.3 MultiplicationAlgorithms............................................. 28 2.3.1 Baugh-Wooley’sMultiplicationAlgorithm ................. 29 2.3.2 Booth’sMultiplicationAlgorithm ........................... 30 2.3.3 SignExtensionforBooth’sMultiplier ....................... 30 2.4 StatisticalErrorMetrics............................................... 32 2.5 DesignSpaceExplorationandOptimizationTechniques............ 33 2.5.1 GeneticAlgorithm............................................. 34 2.5.2 BayesianOptimization........................................ 36 2.6 ArtificialNeuralNetworks............................................ 37 References..................................................................... 39 xiii xiv Contents 3 AccurateMultipliers......................................................... 41 3.1 Introduction............................................................ 41 3.2 Contributions........................................................... 42 3.3 RelatedWork .......................................................... 43 3.4 UnsignedMultiplierArchitecture..................................... 44 3.5 MotivationforSignedMultipliers .................................... 50 3.6 Baugh-Wooley’sMultiplier:Mult-BW ............................... 51 3.7 Booth’sAlgorithm-BasedSignedMultipliers........................ 52 3.7.1 Booth-MultDesign............................................ 53 3.7.2 Booth-OptDesign............................................. 55 3.7.3 Booth-ParDesign ............................................. 56 3.8 ConstantMultipliers................................................... 59 3.9 ResultsandDiscussion................................................ 60 3.9.1 ExperimentalSetupandToolFlow........................... 60 3.9.2 PerformanceComparisonoftheProposedAccurate UnsignedMultiplierAcc...................................... 61 3.9.3 PerformanceComparisonoftheProposedAccurate Signed Multiplier with the State-of-the-Art AccurateMultipliers .......................................... 62 3.9.4 PerformanceComparisonoftheProposedConstant MultiplierwiththeState-of-the-ArtAccurate Multipliers..................................................... 69 3.10 Conclusion............................................................. 70 References..................................................................... 71 4 ApproximateMultipliers.................................................... 73 4.1 Introduction............................................................ 73 4.2 RelatedWork .......................................................... 74 4.3 UnsignedApproximateMultipliers................................... 75 4.3.1 Approximate4×4Multiplier:Approx-1 .................... 75 4.3.2 Approximate4×4Multiplier:Approx-2 .................... 77 4.3.3 Approximate4×4Multiplier:Approx-3 .................... 81 4.4 DesigningHigher-OrderApproximateUnsignedMultipliers....... 84 4.4.1 Accurate Adders for Implementing 8×8 ApproximateMultipliersfrom4×4Approximate Multipliers..................................................... 84 4.4.2 Approximate Adders for Implementing Higher-orderApproximateMultipliers....................... 85 4.5 ApproximateSignedMultipliers:Booth-Approx..................... 86 4.6 ResultsandDiscussion................................................ 90 4.6.1 ExperimentalSetupandToolFlow........................... 90 4.6.2 EvaluationoftheProposedApproximateUnsigned Multipliers..................................................... 91 4.6.3 EvaluationoftheProposedApproximateSigned Multiplier ...................................................... 101

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.