Embedded Systems Anup Kumar Das Akash Kumar Bharadwaj Veeravalli Francky Catthoor Reliable and Energy Effi cient Streaming Multiprocessor Systems Embedded Systems Serieseditors NikilD.Dutt,Irvine,CA,USA GrantMartin,SantaClara,CA,USA PeterMarwedel,Dortmund,Germany This Series addresses current and future challenges pertaining to embedded hardware, software, specifications and techniques. Titles in the Series cover a focused set of embedded topics relating to traditional computing devices as well as high-tech appliances used in newer, personal devices, and related topics. The material will vary by topic but in general most volumes will include fundamental material(whenappropriate),methods,designsandtechniques. Moreinformationaboutthisseriesathttp://www.springer.com/series/8563 Anup Kumar Das • Akash Kumar Bharadwaj Veeravalli • Francky Catthoor Reliable and Energy Efficient Streaming Multiprocessor Systems 123 AnupKumarDas AkashKumar ElectricalandComputer ChairforProcessorDesign EngineeringDepartment DresdenUniversityofTechnology DrexelUniversity,BossoneResearch Dresden,Germany EnterpriseCenter Philadelphia,PA,USA FranckyCatthoor IMEC BharadwajVeeravalli Heverlee,Belgium DepartmentofECE NationalUniversityofSingapore Singapore,Singapore ISSN2193-0155 ISSN2193-0163 (electronic) EmbeddedSystems ISBN978-3-319-69373-6 ISBN978-3-319-69374-3 (eBook) https://doi.org/10.1007/978-3-319-69374-3 LibraryofCongressControlNumber:2017956340 ©SpringerInternationalPublishingAG2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Thisbookisdedicatedtothosewhocontinue thepursuitofknowledge,despitethe obstacleslifepresents.Youholdthekeyto ourfuture.MayGodblessyouwith determinationonyourjourney.Neverlet thosedoubtsornegativityruinyourspirit. Besteadfastinyourquestforknowledge. GodBless. Preface As the performance demands of applications (e.g., multimedia) are growing, multiple processing cores are integrated together to form multiprocessor systems. Energy minimization is a primary optimization objective for these systems. An emerging concern for designs at deep-submicron technology nodes (65nm and below)isthelifetimereliability,asescalatingpowerdensityandhencetemperature variation continues to accelerate wear-out leading to a growing prominence of device defects. As such, reliability and energy need to be incorporated in the multiprocessordesignmethodology,addressingtwokeyaspects: • lifetimeamelioration,i.e.,improvingthelifetimereliabilitythroughenergy-and performance-awareintelligenttaskmapping • graceful degradation, i.e., determining the task mapping for different fault- scenarios while minimizing the energy consumption and providing a graceful performancedegradation Inthisbook,aplatform-baseddesignmethodologyisfirstproposedtominimize temperature-related wear-outs. Fundamental to this methodology is a temperature modelthatpredictsthetemperatureofacoreincorporatingnotonlyitsdependency onthevoltageandfrequencyofoperation(temporaleffect),butalsoitsdependency onthetemperatureofthesurroundingcores(spatialeffect).Theproposedtemper- aturemodelisintegratedinagradient-basedfastheuristicthatcontrolsthevoltage and frequency of the cores to limit the average and peak temperature leading to a longerlifetime,simultaneouslyminimizingtheenergyconsumption. A design flow is then proposed as a part of the hardware-software co-design methodologytodeterminetheminimumnumberofcoresandthesizeoftheFPGA fabricofareconfigurablemultiprocessorsystem.Theobjectiveistomaximizethe lifetime reliability of the cores while satisfying a given area, performance, and energy budget. The proposed flow incorporates individual as well as concurrent applicationswithdifferentperformancerequirementsandthermalbehaviors.While theexistingstudiesdetermineplatformarchitectureforenergyandareaminimiza- tion, this is the first approach for reconfigurable multiprocessor system design consideringlifetimereliabilitytogetherwithmulti-applicationuse-cases. vii viii Preface Toprovidegracefulperformancedegradationinthepresenceoffaults,areactive fault-tolerance technique is also proposed that explores different task mapping alternatives to minimize energy consumption while guaranteeing throughput for all processor fault-scenarios. Directed acyclic graphs (DAGs) and synchronous data flow graphs (SDFGs) are used to model applications making the proposed methodologyapplicabletostreamingmultimediaandnon-multimediaapplications. Fundamental to this approach is a novel scheduling algorithm based on self- timedexecution,whichminimizesboththeschedulestorageoverheadandrun-time scheduleconstructionoverhead.Unliketheexistingapproacheswhichconsidertask mappingonly,theproposedtechniqueconsiderstaskmappingandschedulinginan integratedmanner,achievingsignificantimprovementwithrespecttothesestate-of- the-artapproaches. Finally, an adaptive run-time manager is designed for lifetime amelioration of multiprocessor systems by managing thermal variations, both within (intra) and across (inter) applications. Core to this approach is a reinforcement learning algorithm, which interfaces with the on-board thermal sensors and controls the voltage and frequency of operation and the thread-to-core affinity over time. This approach is built on top of the design-time analysis to minimize learning time, addressrun-timevariability,andsupportnewapplications,makingtheoverallbook objective to provide a complete and systematic design solution for reliable and energy-efficientmultiprocessorsystems. Texas,UnitedStates AnupKumarDas Dresden,Germany AkashKumar Singapore,Singapore BharadwajVeeravalli Heverlee,Belgium FranckyCatthoor Contents 1 Introduction .................................................................. 1 1.1 TrendsinMultiprocessorSystemsDesign............................. 1 1.1.1 TrendsinTransistorScaling.................................... 1 1.1.2 TrendsinMicroprocessorDesign.............................. 2 1.1.3 TrendsinMultiprocessorSystems............................. 4 1.2 MultiprocessorSystemClassification.................................. 6 1.2.1 Memory-BasedClassification.................................. 6 1.2.2 Processor-BasedClassification................................. 7 1.3 MultiprocessorSystemDesignFlow................................... 8 1.3.1 Platform-BasedDesign......................................... 10 1.3.2 Hardware–SoftwareCo-design ................................ 10 1.4 DesignChallengesforMultiprocessorSystems....................... 11 1.4.1 EnergyConcernforMultiprocessorSystems ................. 11 1.4.2 ReliabilityConcernforMultiprocessorSystems.............. 12 1.5 ReliableandEnergyEfficientMultiprocessorDesign ................ 13 1.5.1 Design-TimeMethodology..................................... 13 1.5.2 Run-TimeMethodology........................................ 15 1.6 KeyHighlightsofThisBook........................................... 16 References..................................................................... 18 2 OperationalSemanticsofApplicationandReliabilityModel .......... 23 2.1 ApplicationModelasSynchronousDataFlowGraphs............... 23 2.2 Wear-OutRelatedReliabilityModel................................... 24 2.2.1 Device-LevelReliabilityModeling............................ 24 2.2.2 Core-LevelReliabilityModeling .............................. 27 2.2.3 System-LevelReliabilityModeling............................ 29 References..................................................................... 31 ix x Contents 3 LiteratureSurveyonSystem-LevelOptimizationsTechniques......... 33 3.1 Design-TimeBasedReliabilityandEnergyOptimization............ 33 3.1.1 TaskMappingApproachesforPlatform-BasedDesign....... 33 3.1.2 ExistingApproachesonHardware–SoftwareCo-design..... 36 3.1.3 ExistingApproachesonReactiveFault-Tolerance............ 36 3.2 Run-TimeBasedReliabilityandEnergyOptimization ............... 38 References..................................................................... 40 4 ReliabilityandEnergy-AwarePlatform-BasedMultiprocessor Design ......................................................................... 45 4.1 Introduction ............................................................. 45 4.2 ProblemFormulation ................................................... 46 4.2.1 ApplicationModel.............................................. 46 4.2.2 ArchitectureModel............................................. 47 4.2.3 MappingRepresentation........................................ 47 4.2.4 MTTFComputation ............................................ 48 4.2.5 EnergyComputation............................................ 49 4.2.6 Reliability-EnergyJointMetric................................ 51 4.3 ProposedTemperatureModel .......................................... 51 4.4 ComputingTemperaturefromaSchedule............................. 55 4.5 DesignMethodology.................................................... 57 4.5.1 ReliabilityOptimizationforIndividualApplication.......... 59 4.5.2 ReliabilityOptimizationforUse-Cases........................ 60 4.6 ExperimentsandDiscussions........................................... 62 4.6.1 TimeComplexity ............................................... 62 4.6.2 ValidationoftheTemperatureModel.......................... 64 4.6.3 ComparisonwithAccurateTemperatureModel .............. 65 4.6.4 ImpactofTemperatureMisprediction ......................... 66 4.6.5 MTTFImprovementConsideringTaskRemapping .......... 67 4.6.6 ReliabilityandEnergyImprovement .......................... 70 4.6.7 Use-CaseOptimizationResults................................ 72 4.7 Remarks ................................................................. 73 References..................................................................... 73 5 ReliabilityandEnergy-AwareCo-designofMultiprocessorSystems.. 75 5.1 Introduction ............................................................. 75 5.2 Reliability-AwareHardware–SoftwareTaskPartitioning............. 77 5.2.1 ApplicationandArchitectureModel........................... 78 5.2.2 ReliabilityModelingConsideringSingleActor............... 80 5.2.3 ReliabilityModelingwithMultipleInterconnectedActors... 82 5.2.4 LifetimeReliabilityandTransientFaultReliability Trade-Off........................................................ 83 5.2.5 Hardware–SoftwarePartitioningFlow......................... 85
Description: