ebook img

Approximate Acceleration for a Post-Multicore Era - Department of PDF

189 Pages·2014·6.28 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Approximate Acceleration for a Post-Multicore Era - Department of

©Copyright2013 HadiEsmaeilzadeh Approximate Accelera(cid:415)on for a Post Mul(cid:415)core Era HadiEsmaeilzadeh Adisserta(cid:415)on submi(cid:425)edinpar(cid:415)alfulfillmentofthe requirementsforthedegreeof DoctorofPhilosophy UniversityofWashington 2013 ReadingCommi(cid:425)ee: DougBurger,Chair LuisCeze,Chair KathrynMcKinley MarkOskin ProgramAuthorizedtoOfferDegree: DepartmentofComputerScienceandEngineering UniversityofWashington Abstract ApproximateAccelera(cid:415)onforaPostMul(cid:415)coreEra HadiEsmaeilzadeh Co-ChairsoftheSupervisoryCommi(cid:425)ee: ProfessorDougBurger Microso(cid:332)Research AssociateProfessorLuisCeze UniversityofWashington Star(cid:415)ng in 2004, the microprocessor industry has shi(cid:332)ed to mul(cid:415)core scaling—increasing the number of coresperdieeachtechnologygenera(cid:415)on—asitsprincipalstrategyforcon(cid:415)nuingperformancegrowth.This workfirststudiestheinterplaybetweentheriseofmul(cid:415)coreprocessorsandtheriseofmanagedlanguages— e.g. Java—inthepastdecade. Then,thisdisserta(cid:415)onlooksintofuture,studiesthetrendsintransistorscal- ing,andinves(cid:415)gateswhethermul(cid:415)corescalingwillsustaintradi(cid:415)onalperformanceimprovementsthathave beenthedrivingforcefortheen(cid:415)recompu(cid:415)ngindustryoverthepastfortyyears.Theresultsfromourwork challengestheconven(cid:415)onalwisdomthatadvocatesmul(cid:415)corescalingisaviablepathforexploi(cid:415)ngincreased transistorcountsandsustaininghistoricalperformancetrends.Asthenumberofcoresincreases,powercon- straintsmaypreventpoweringofallcoresattheirfullspeed,requiringafrac(cid:415)onofthecorestobepowered offatall(cid:415)mes. Accordingtoourmodels, thefrac(cid:415)onofthesechipsthatisdarkmaybeasmuchas50% withinthreeprocessgenera(cid:415)ons.Thelowu(cid:415)lityofthisdarksiliconmaypreventbothscalingtohighercore countsandul(cid:415)matelytheeconomicviabilityofcon(cid:415)nuedsiliconscaling. Ourstudyhighlightsthatradical departuresfromconven(cid:415)onalapproachesmaybenecessarytosustainthetradi(cid:415)onalrateofperformance improvementsingeneral-purposecompu(cid:415)ng.Thesetechniquesshouldprovidesignificantperformanceand energyefficiencygainsacrossawiderangeofapplica(cid:415)ons. Thisdisserta(cid:415)onthenproposesanewdirec(cid:415)on forgeneral-purposecompu(cid:415)ngthatleveragesapproxima(cid:415)ontoaddressthedarksiliconchallenge. While conven(cid:415)onaltechniques—suchasdynamicvoltageandfrequencyscaling—tradeperformanceforenergy, general-purposeapproximatecompu(cid:415)ngtradeserrorforbothperformanceandenergygains. Wepropose variable-precisionarchitectures,aframeworkfromtheISA—Instruc(cid:415)onSetArchitecture—tothetransistor- levelimplementa(cid:415)onsthatallowconven(cid:415)onalvonNeumannprocessorstotradeaccuracyforenergyatthe granularityofsingleinstruc(cid:415)ons. Then,weproposeanend-to-endsolu(cid:415)on,fromtheprogrammingmodel tothemicroarchitecturethatleveragesanapproximatealgorithmictransforma(cid:415)ontoautoma(cid:415)callyconvert ahotcoderegionfromavonNeumannmodeltoaneuralmodel.Thissolu(cid:415)onanditsassociatedalgorithmic transforma(cid:415)onenablesanewclassofaccelerators,calledNeuralProcessingUnits(NPUs)withimplemen- ta(cid:415)onpoten(cid:415)alinboththedigitalandtheanalogdomain. Thisworkshowssignificantgainsbothinperfor- manceandenergywhentheabstrac(cid:415)onoffullaccuracyisrelaxedingeneral-purposecompu(cid:415)ng.Theresults fromthisdisserta(cid:415)onshowthatgeneral-purposeapproximatecompu(cid:415)ngcanbeapathforwardwhenthe gainsfromconven(cid:415)onalapproachesarediminishing. T(cid:131)(cid:144)(cid:189)(cid:155) (cid:202)(cid:165) C(cid:202)(cid:196)(cid:227)(cid:155)(cid:196)(cid:227)(cid:221) Page ListofFigures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v ListofTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter1: IndustryofNewPossibili(cid:415)es . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Moore’sLawEnablesNewPossibili(cid:415)es . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 DennardScalingEnablesMoore’sLaw . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 EndofDennardScalingandtheMul(cid:415)coreEra . . . . . . . . . . . . . . . . . . . . 7 1.4 General-PurposeApproximateCompu(cid:415)ngforaPostMul(cid:415)coreEra . . . . . . . . . 9 1.5 Disserta(cid:415)onOrganiza(cid:415)onandContribu(cid:415)ons . . . . . . . . . . . . . . . . . . . . . 11 Chapter2: LookingBack: Mul(cid:415)cores,MeasuredPower,andModernWorkloads . . . . 17 2.1 Introduc(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4 Perspec(cid:415)ve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5 FeatureAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.6 ConcludingRemarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter3: LookingForward: DarkSiliconandtheEndofMul(cid:415)coreEra . . . . . . . . . 41 3.1 Introduc(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3 DeviceModel(M-Device) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 i 3.4 CoreModel(M-Core) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5 Mul(cid:415)coreModel(M-CMP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.6 CombiningModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.7 ScalingandFutureMul(cid:415)cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.8 ModelAssump(cid:415)ons,Valida(cid:415)on,andLimita(cid:415)ons . . . . . . . . . . . . . . . . . . . 69 3.9 ConcludingRemarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Chapter4: Variable-PrecisionvonNeumannArchitectures . . . . . . . . . . . . . . . 75 4.1 Introduc(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2 AnISAforDisciplinedApproximateComputa(cid:415)on . . . . . . . . . . . . . . . . . . 77 4.3 DesignSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4 Truffle: ADual-VoltageMicroarchitectureforDisciplinedApproxima(cid:415)on . . . . . . 87 4.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.6 ConcludingRemarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter5: FromavonNeumanntoaHybridvonNeumann-NeuralModelofCompu(cid:415)ng107 5.1 Introduc(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.3 ProgrammingModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.4 Compila(cid:415)onWorkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 ArchitectureDesignforNPUAccelera(cid:415)on . . . . . . . . . . . . . . . . . . . . . . 120 5.6 NeuralProcessingUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.7 Evalua(cid:415)on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.8 Limita(cid:415)onsandFutureDirec(cid:415)ons . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.9 ConcludingRemarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Chapter6: RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.1 Power-PerformanceMeasurement . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2 ModelingMul(cid:415)cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.3 ApproximateCompu(cid:415)ng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.4 VoltageOverscaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.5 Informa(cid:415)onFlowTracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.6 General-PurposeConfigurableAccelerators . . . . . . . . . . . . . . . . . . . . . 146 6.7 NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Chapter7: APathForward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 ii Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 iii iv

Description:
Star ng in 2004, the microprocessor industry has shifted to mul core straints may prevent powering of all cores at their full speed, requiring a frac on of the cores to be powered conven onal techniques—such as dynamic voltage and frequency 2013$. Mul7core$Era$. Dennard$scaling$ broke$. 740$
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.