Table Of Content

Bert Moons · Daniel Bankman Marian Verhelst Embedded Deep Learning Algorithms, Architectures and Circuits for Always-on Neural Network Processing Embedded Deep Learning Bert Moons • Daniel Bankman (cid:129) Marian Verhelst Embedded Deep Learning Algorithms, Architectures and Circuits for Always-on Neural Network Processing 123 BertMoons DanielBankman ESAT-MICAS DepartmentofElectricalEngineering KULeuven StanfordUniversity Leuven,Belgium Stanford,CA,USA MarianVerhelst ESAT-MICAS KULeuven Leuven,Belgium ISBN978-3-319-99222-8 ISBN978-3-319-99223-5 (eBook) https://doi.org/10.1007/978-3-319-99223-5 LibraryofCongressControlNumber:2018956280 ©SpringerNatureSwitzerlandAG2019 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland To Lise Preface Although state of the art in many typical machine learning tasks, deep learning algorithmsareverycostlyintermsofenergyconsumption,duetotheirlargeamount of required computations and huge model sizes. Because of this, deep learning applications on battery-constrained wearables have only been possible through wireless connections with a resourceful cloud. This setup has several drawbacks. First,thereareprivacyconcerns.Cloudcomputingrequiresuserstosharetheirraw data—images,video,locations,speech—witharemotesystem.Mostusersarenot willing to do this. Second, the cloud-setup requires users to be connected all the time, which is unfeasible given current cellular coverage. Furthermore, real-time applications require low latency connections, which cannot be guaranteed using the current communication infrastructure. Finally, wireless connections are very inefficient—requiringtoomuchenergypertransferredbitforreal-timedatatransfer on energy-constrained platforms. All these issues—privacy, latency/connectivity, and costly wireless connections—can be resolved by moving towards computing attheedge.Findingwaystodothisisthemaintopicofthisdissertation.Itfocuses ontechniquestominimizetheenergyconsumptionofdeeplearningalgorithmsfor embeddedapplicationsonbattery-constrainedwearableedgedevices. Computing in the edge is only possible if these deep learning algorithms can be run in a more energy-efficient way, within the energy and power budget of the computing platform available on a wearable device. In order to achieve this, severalinnovationsarenecessaryonalllevelsofanapplication’sdesignhierarchy. Smarterapplicationscanbedevelopedformorestatisticallyefficientdeeplearning algorithms, which in turn should run on optimized hardware platforms built on specifically tailored circuits. Finally, designers should not focus on any of these fieldsseparatelybutshouldco-optimizehardwareandsoftwaretocreateminimum energy deep learning platforms. This book is an overview of possible solutions towardsdesigningsuchsystems. Leuven,Belgium BertMoons Stanford,CA,USA DanielBankman Leuven,Belgium MarianVerhelst May2018 vii Acknowledgments The authors would like to thank several people and institutions for their valuable contributionstothiswork. We thank Tom Michiels, Edith Beigne, Boris Murmann, Wim Dehaene, Tinne Tuytelaars,LudoFroyen,andHugoHensfortheircommentsanddiscussion.IWT, intel,Qualcomm,andSynopsysfortheirfunding,software,andsupport.Thanksto FlorianDarveandMarie-SophieRedonatCEA-LETIandtoEtienneWoutersand LucFolensatimec/IC-Linkfortheirback-endsupportinASICdesign. We would also like to acknowledgeall coauthorsfor their contributionsto this manuscript.ThankstoKoenGoetschalckx,NickVanBerckelaer,BertDeBraban- dere,LitaYang,RoelUytterhoeven,MartinAndraud,andStevenLauwereins. ix Contents 1 EmbeddedDeepNeuralNetworks......................................... 1 1.1 Introduction ............................................................. 1 1.2 MachineLearning....................................................... 2 1.2.1 Tasks,T ......................................................... 3 1.2.2 PerformanceMeasures,P ...................................... 3 1.2.3 Experience,E ................................................... 3 1.3 DeepLearning........................................................... 4 1.3.1 DeepFeed-ForwardNeuralNetworks......................... 6 1.3.2 ConvolutionalNeuralNetworks ............................... 7 1.3.3 RecurrentNeuralNetworks .................................... 15 1.3.4 TrainingDeepNeuralNetworks............................... 17 1.4 ChallengesforEmbeddedDeepNeuralNetworks.................... 23 1.5 BookContributions..................................................... 25 References..................................................................... 27 2 OptimizedHierarchicalCascadedProcessing............................ 33 2.1 Introduction ............................................................. 33 2.2 HierarchicalCascadedSystems ........................................ 35 2.2.1 GeneralizingTwo-StageWake-UpSystems................... 35 2.2.2 HierarchicalCost,Precision,andRecall ...................... 36 2.2.3 ARooflineModelforHierarchicalClassifiers................ 38 2.2.4 OptimizedHierarchicalCascadedSensing.................... 41 2.3 GeneralProofofConcept............................................... 41 2.3.1 SystemDescription............................................. 42 2.3.2 InputStatistics .................................................. 43 2.3.3 Experiments..................................................... 44 2.3.4 Conclusion...................................................... 47 xi xii Contents 2.4 Casestudy:Hierarchical,CNN-BasedFaceRecognition............. 47 2.4.1 AFaceRecognitionHierarchy................................. 47 2.4.2 HierarchicalCost,Precision,andRecall....................... 49 2.4.3 AnOptimizedFaceRecognitionHierarchy................... 50 2.5 Conclusion .............................................................. 52 References..................................................................... 54 3 Hardware-AlgorithmCo-optimizations................................... 55 3.1 AnIntroductiontoHardware-AlgorithmCo-optimization............ 55 3.1.1 ExploitingNetworkStructure.................................. 56 3.1.2 EnhancingandExploitingSparsity............................ 59 3.1.3 EnhancingandExploitingFault-Tolerance.................... 60 3.2 EnergyGainsinLow-PrecisionNeuralNetworks..................... 63 3.2.1 EnergyConsumptionofOff-ChipMemory-Access .......... 64 3.2.2 GenericHardwarePlatformModeling......................... 64 3.3 Test-TimeFixed-PointNeuralNetworks .............................. 65 3.3.1 AnalysisandExperiments...................................... 66 3.3.2 InfluenceofQuantizationonClassificationAccuracy........ 66 3.3.3 EnergyinSparseFPNNs....................................... 69 3.3.4 Results........................................................... 71 3.3.5 Discussion....................................................... 72 3.4 Train-TimeQuantizedNeuralNetworks............................... 73 3.4.1 TrainingQNNs.................................................. 74 3.4.2 EnergyinQNNs ................................................ 77 3.4.3 Experiments..................................................... 77 3.4.4 Results........................................................... 79 3.4.5 Discussion....................................................... 83 3.5 ClusteredNeuralNetworks............................................. 83 3.6 Conclusion .............................................................. 84 References..................................................................... 85 4 CircuitTechniquesforApproximateComputing ........................ 89 4.1 IntroducingtheApproximateComputingParadigm .................. 89 4.2 ApproximateComputingTechniques.................................. 91 4.2.1 ResilienceIdentificationandQualityManagement........... 92 4.2.2 ApproximateCircuits........................................... 93 4.2.3 ApproximateArchitectures..................................... 94 4.2.4 ApproximateSoftware.......................................... 95 4.2.5 Discussion....................................................... 95 4.3 DVAFS:Dynamic-Voltage-Accuracy-Frequency-Scaling............ 96 4.3.1 DVAFSBasics .................................................. 96 4.3.2 ResilienceIdentificationforDVAFS........................... 98 4.3.3 EnergyGainsinDVAFS........................................ 100 4.4 PerformanceAnalysisofDVAFS ...................................... 102 4.4.1 BlockLevelDVAFS............................................ 102 4.4.2 System-LevelDVAFS .......................................... 105 Contents xiii 4.5 ImplementationChallengesofDVAFS ................................ 107 4.5.1 Functional Implementation of Basic DVA(F)S BuildingBlocks................................................. 108 4.5.2 PhysicalImplementationofDVA(F)SBuildingBlocks ...... 109 4.6 OverviewandDiscussion............................................... 111 References..................................................................... 111 5 ENVISION: Energy-ScalableSparse ConvolutionalNeural NetworkProcessing.......................................................... 115 5.1 NeuralNetworkAcceleration .......................................... 115 5.2 TheEnvisionProcessorArchitecture .................................. 117 5.2.1 ProcessorDatapath ............................................. 117 5.2.2 On-ChipMemoryArchitecture ................................ 121 5.2.3 HardwareSupportforExploitingNetworkSparsity.......... 122 5.2.4 Energy-EfficientFlexibility Througha Custom InstructionSet................................................... 124 5.2.5 ConclusionandOverview...................................... 125 5.3 DVASCompatibleEnvisionV1........................................ 125 5.3.1 RTLLevelHardwareSupport.................................. 126 5.3.2 PhysicalImplementation....................................... 127 5.3.3 MeasurementResults........................................... 129 5.3.4 EnvisionV1Overview ......................................... 135 5.4 DVAFS-CompatibleEnvisionV2 ...................................... 136 5.4.1 RTLLevelHardwareSupport.................................. 136 5.4.2 PhysicalImplementation....................................... 138 5.4.3 MeasurementResults........................................... 139 5.4.4 EnvisionV2Overview ......................................... 148 5.5 Conclusion .............................................................. 149 References..................................................................... 150 6 BINAREYE: Digitaland Mixed-SignalAlways-OnBinary NeuralNetworkProcessing................................................. 153 6.1 BinaryNeuralNetworks................................................ 153 6.1.1 Introduction..................................................... 153 6.1.2 BinaryNeuralNetworkLayers ................................ 154 6.2 BinaryNeuralNetworkApplications.................................. 158 6.3 AProgrammableInput-to-LabelAcceleratorArchitecture........... 159 6.3.1 256X:ABaselineBinaryNetArchitecture.................... 161 6.3.2 SX:AFlexibleDVAFSBinaryNetArchitecture.............. 170 6.4 MSBNN:AMixed-Signal256XImplementation..................... 173 6.4.1 Switched-CapacitorNeuronArray............................. 174 6.4.2 MeasurementResults........................................... 175 6.4.3 AnalogSignalPathOverhead.................................. 178 6.5 BinarEye:ADigitalSXImplementation .............................. 179 6.5.1 AnAll-DigitalBinaryNeuron ................................. 179 6.5.2 PhysicalImplementation....................................... 180

Description:

This book covers algorithmic and hardware implementation techniques to enable embedded deep learning. The authors describe synergetic design approaches on the application-, algorithmic-, computer architecture-, and circuit-level that will help in achieving the goal of reducing the computational cost

Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing PDF

216 Pages·2019·8.32 MB·English

by Bert Moons

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing PDF Free - Full Version

by Bert Moons| 2019| 216 pages| 8.32| English

Download Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing by Bert Moons in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing

Detailed Information

Author:	Bert Moons
Publication Year:	2019
Pages:	216
Language:	English
File Size:	8.32
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing PDF?

Yes, on https://PDFdrive.to you can download Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing by Bert Moons completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing on my mobile device?

After downloading Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing?

Yes, this is the complete PDF version of Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing by Bert Moons. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Embedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.