©Copyright2013 HadiEsmaeilzadeh Approximate Accelera(cid:415)on for a Post Mul(cid:415)core Era HadiEsmaeilzadeh Adisserta(cid:415)on submi(cid:425)edinpar(cid:415)alfulfillmentofthe requirementsforthedegreeof DoctorofPhilosophy UniversityofWashington 2013 ReadingCommi(cid:425)ee: DougBurger,Chair LuisCeze,Chair KathrynMcKinley MarkOskin ProgramAuthorizedtoOfferDegree: DepartmentofComputerScienceandEngineering UniversityofWashington Abstract ApproximateAccelera(cid:415)onforaPostMul(cid:415)coreEra HadiEsmaeilzadeh Co-ChairsoftheSupervisoryCommi(cid:425)ee: ProfessorDougBurger Microso(cid:332)Research AssociateProfessorLuisCeze UniversityofWashington Star(cid:415)ng in 2004, the microprocessor industry has shi(cid:332)ed to mul(cid:415)core scaling—increasing the number of coresperdieeachtechnologygenera(cid:415)on—asitsprincipalstrategyforcon(cid:415)nuingperformancegrowth.This workfirststudiestheinterplaybetweentheriseofmul(cid:415)coreprocessorsandtheriseofmanagedlanguages— e.g. Java—inthepastdecade. Then,thisdisserta(cid:415)onlooksintofuture,studiesthetrendsintransistorscal- ing,andinves(cid:415)gateswhethermul(cid:415)corescalingwillsustaintradi(cid:415)onalperformanceimprovementsthathave beenthedrivingforcefortheen(cid:415)recompu(cid:415)ngindustryoverthepastfortyyears.Theresultsfromourwork challengestheconven(cid:415)onalwisdomthatadvocatesmul(cid:415)corescalingisaviablepathforexploi(cid:415)ngincreased transistorcountsandsustaininghistoricalperformancetrends.Asthenumberofcoresincreases,powercon- straintsmaypreventpoweringofallcoresattheirfullspeed,requiringafrac(cid:415)onofthecorestobepowered offatall(cid:415)mes. Accordingtoourmodels, thefrac(cid:415)onofthesechipsthatisdarkmaybeasmuchas50% withinthreeprocessgenera(cid:415)ons.Thelowu(cid:415)lityofthisdarksiliconmaypreventbothscalingtohighercore countsandul(cid:415)matelytheeconomicviabilityofcon(cid:415)nuedsiliconscaling. Ourstudyhighlightsthatradical departuresfromconven(cid:415)onalapproachesmaybenecessarytosustainthetradi(cid:415)onalrateofperformance improvementsingeneral-purposecompu(cid:415)ng.Thesetechniquesshouldprovidesignificantperformanceand energyefficiencygainsacrossawiderangeofapplica(cid:415)ons. Thisdisserta(cid:415)onthenproposesanewdirec(cid:415)on forgeneral-purposecompu(cid:415)ngthatleveragesapproxima(cid:415)ontoaddressthedarksiliconchallenge. While conven(cid:415)onaltechniques—suchasdynamicvoltageandfrequencyscaling—tradeperformanceforenergy, general-purposeapproximatecompu(cid:415)ngtradeserrorforbothperformanceandenergygains. Wepropose variable-precisionarchitectures,aframeworkfromtheISA—Instruc(cid:415)onSetArchitecture—tothetransistor- levelimplementa(cid:415)onsthatallowconven(cid:415)onalvonNeumannprocessorstotradeaccuracyforenergyatthe granularityofsingleinstruc(cid:415)ons. Then,weproposeanend-to-endsolu(cid:415)on,fromtheprogrammingmodel tothemicroarchitecturethatleveragesanapproximatealgorithmictransforma(cid:415)ontoautoma(cid:415)callyconvert ahotcoderegionfromavonNeumannmodeltoaneuralmodel.Thissolu(cid:415)onanditsassociatedalgorithmic transforma(cid:415)onenablesanewclassofaccelerators,calledNeuralProcessingUnits(NPUs)withimplemen- ta(cid:415)onpoten(cid:415)alinboththedigitalandtheanalogdomain. Thisworkshowssignificantgainsbothinperfor- manceandenergywhentheabstrac(cid:415)onoffullaccuracyisrelaxedingeneral-purposecompu(cid:415)ng.Theresults fromthisdisserta(cid:415)onshowthatgeneral-purposeapproximatecompu(cid:415)ngcanbeapathforwardwhenthe gainsfromconven(cid:415)onalapproachesarediminishing. T(cid:131)(cid:144)(cid:189)(cid:155) (cid:202)(cid:165) C(cid:202)(cid:196)(cid:227)(cid:155)(cid:196)(cid:227)(cid:221) Page ListofFigures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v ListofTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter1: IndustryofNewPossibili(cid:415)es . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Moore’sLawEnablesNewPossibili(cid:415)es . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 DennardScalingEnablesMoore’sLaw . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 EndofDennardScalingandtheMul(cid:415)coreEra . . . . . . . . . . . . . . . . . . . . 7 1.4 General-PurposeApproximateCompu(cid:415)ngforaPostMul(cid:415)coreEra . . . . . . . . . 9 1.5 Disserta(cid:415)onOrganiza(cid:415)onandContribu(cid:415)ons . . . . . . . . . . . . . . . . . . . . . 11 Chapter2: LookingBack: Mul(cid:415)cores,MeasuredPower,andModernWorkloads . . . . 17 2.1 Introduc(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4 Perspec(cid:415)ve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5 FeatureAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.6 ConcludingRemarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter3: LookingForward: DarkSiliconandtheEndofMul(cid:415)coreEra . . . . . . . . . 41 3.1 Introduc(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3 DeviceModel(M-Device) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 i 3.4 CoreModel(M-Core) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5 Mul(cid:415)coreModel(M-CMP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.6 CombiningModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.7 ScalingandFutureMul(cid:415)cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.8 ModelAssump(cid:415)ons,Valida(cid:415)on,andLimita(cid:415)ons . . . . . . . . . . . . . . . . . . . 69 3.9 ConcludingRemarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Chapter4: Variable-PrecisionvonNeumannArchitectures . . . . . . . . . . . . . . . 75 4.1 Introduc(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2 AnISAforDisciplinedApproximateComputa(cid:415)on . . . . . . . . . . . . . . . . . . 77 4.3 DesignSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4 Truffle: ADual-VoltageMicroarchitectureforDisciplinedApproxima(cid:415)on . . . . . . 87 4.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.6 ConcludingRemarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter5: FromavonNeumanntoaHybridvonNeumann-NeuralModelofCompu(cid:415)ng107 5.1 Introduc(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.3 ProgrammingModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.4 Compila(cid:415)onWorkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 ArchitectureDesignforNPUAccelera(cid:415)on . . . . . . . . . . . . . . . . . . . . . . 120 5.6 NeuralProcessingUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.7 Evalua(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.8 Limita(cid:415)onsandFutureDirec(cid:415)ons . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.9 ConcludingRemarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Chapter6: RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.1 Power-PerformanceMeasurement . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2 ModelingMul(cid:415)cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.3 ApproximateCompu(cid:415)ng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.4 VoltageOverscaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.5 Informa(cid:415)onFlowTracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.6 General-PurposeConfigurableAccelerators . . . . . . . . . . . . . . . . . . . . . 146 6.7 NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Chapter7: APathForward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 ii Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 iii iv
Description: