ABSTRACT WADHAVKAR, SALIL VIJAY. Architecting a Workload-agnostic Heterogeneous Multi-core Processor. (UnderthedirectionofDr. EricRotenberg.) Improvingsingle-threadperformancestillremainsanimportantchallenge. Heteroge- neousmulti-coresofferthepotentialofimprovingsingle-threadperformancebyproviding a number of core types that capture a wide range of application behaviors. Some prior approachesofchoosingtheconstituentcoresinaheterogeneousmulti-corehavefocusedon reducingpowerconsumptionbyemployingmonotoniccores. Otherapproachesthataimto improveperformanceassumeaprioriknowledgeoftheworkload. Itisuncertainhowsuch workload-specificapproacheswouldperformiftheworkloadchangesinthefuture. Thisdissertationaddressesthequestionofchoosingthecoresinaheterogeneousmulti- coreinaworkload-agnosticmanner. Theprocessofselectingtheconstituentcoresiscom- pletelyindependentofanybenchmarksuite. Wepresentseveralapproachesofchoosing cores,andshowthattheresultingmulti-coredelivershighperformanceforalargenumber ofapplicationphases. We classify applications in one of four categories of kernels: pointer-chasing, array manipulation, arbitrary serial and arbitrary parallel. We systematically vary instruction- level parameters and evaluate the highest performing heterogeneous multi-cores using synthetically generated kernels. Since the resulting multi-core is workload-agnostic, its performanceonrealapplicationphasesisalmostasgoodasacustomizedheterogeneous multi-core. Moreover,wedemonstratepotentialpitfallsofcustomizationbyshowingthat multi-cores tuned to a subset of the actual workload may perform poorly on the entire workload. Weusestatisticaltoolssuchasclassificationtreestounderstandtherelationshipsbe- tweeninstruction-levelparametersandcoresuitability. Theclassificationtreesareusedasa startingpointforapplicationsteeringmechanisms. Weshowthatanapplicationsteering mechanismbasedonclassificationtreesperformsbetterthanrandomsteeringonaverage. (cid:13)c Copyright2012bySalilVijayWadhavkar AllRightsReserved ArchitectingaWorkload-agnosticHeterogeneousMulti-coreProcessor by SalilVijayWadhavkar AdissertationsubmittedtotheGraduateFacultyof NorthCarolinaStateUniversity inpartialfulfillmentofthe requirementsfortheDegreeof DoctorofPhilosophy ComputerEngineering Raleigh,NorthCarolina 2012 APPROVEDBY: Dr. EricRotenberg Dr. JamesTuck ChairofAdvisoryCommittee Dr. HuiyangZhou Dr. SteffenHeber DEDICATION ToAai,Baba,andAdu ii BIOGRAPHY Salil Wadhavkar is from Mumbai, India. Prior to joining the Ph.D. program at NCSU, he receivedhisM.S.E.degreeinElectricalEngineeringfromTempleUniversityin2006, and B.E.degreeinElectronicsEngineeringfromtheUniversityofMumbai,Indiain2003. His professionalinterestsincludecomputerarchitecture,heterogeneousmulti-cores,workload analysis,andtheimpactoftechnologyoneverydaylife. DuringhisPhDprogram,hehadthe opportunitytointernatIntelCorp. toworkonheterogeneoussystems. Heisastudentmem- beroftheIEEE,ACM-SIGARCHandPhiKappaPhi. Inhissparetime,heplaysbadminton, learnstoplaytheguitar,andpursuesphotography. iii ACKNOWLEDGEMENTS ThelastsixyearsasaPhDstudentatNCStatehavebeensomeofmymostmemorableyears. Throughtimesofdifficulty,stress,andthegeneralmiseryofgradlife,manyindividualshave beenasourceofstrength,courage,supportandlove,andhavemadethisjourneyimmensely fulfilling. Theseparagraphsrepresentafeebleattempttoshowmygratitudetothemina ratherclumsyformofexpression,thatis,language. I would like to thank my parents for their unconditional love, encouragement, and support through the years. My achievements are a result of their nurture and the values they instilled in me. Their support and sacrifices will always motivate and inspire me to wholeheartedlypursuemydreams. Ihavebeenveryfortunatetoworkwithmyadvisor, Dr. EricRotenberg. Hisabilityto thinkdeeplyandcritically,aswellasarticulatecomplexideaswithclarityhasalwaysinspired me. Hispassionforcomputerarchitecturehasinculcatedinmeastrongworkethicofalways doingmybest,nomatterhowtrivialthetask. Mostimportantly,hisencouragementand patiencehavetaughtmethevalueofpersistenceanddiligenceinaccomplishingone’sgoal. I amalsogratefultohimforprovidingmewithfinancialsupportovertheyears. I want to thank my committee members Dr. James Tuck, Dr. Huiyang Zhou, and Dr. SteffenHeberfortheirconstructivefeedbackandcomments. Theirsuggestionshavebeen veryvaluableinimprovingmydissertation. LindaFontes,ElaineHardinandSandyBronsonhavealwaysprovidedpromptandhelpful iv administrativesupport. Iwouldliketothankthemforquicklyandsmoothlyprocessingall paperwork,especiallyclosetodeadlines. ThisresearchwassupportedbyNSFgrantCCF-0811707andgiftsfromIntelandIBM. Anyopinions,findings,andconclusionsorrecommendationsexpressedhereinarethoseof theauthoranddonotnecessarilyreflecttheviewsoftheNationalScienceFoundation. My fellow CESR dwellers have been an exciting and intellectually stimulating group of people to work with. In addition to being wonderful colleagues, they have been good friends.IwanttothankNiketChoudhary,SandeepNavada,MuawyaAl-Otoom,ElliottForbes, HashemHashemi,RamiAl-Sheikh,BrandonDwiel,MarkDechene,JayneelGandhi,Hiran Mayukh,TanmayShah,andRajeshwarVankaformanybrainstormingsessions,insightful discussions,andwittybanter. Outside of CESR, my friends have been a source of diversion during stressful times. Weekendget-togethersandtheoccasionalpokernightswithJitendraKumar,PradeepSharma, AdwaitBachuwar,PrabhakarTembhurne,ShrirajMisal,AmitNaik,andShreekanthPavani aremostmemorable,andprovidedamuchneededbreakfromthemonotonyofgradlife. ConversationsandvirtualhangoutswithmycollegebuddiesNinadPradhan,RahulSaxena, Raghvendra Cowlagi, and Robert Chettiar reminded me of life outside and beyond grad school. And finally, my wife and best friend, Aditi, has been my biggest source of inspiration, encouragementandunrelentingsupport. Itisdifficulttoexpressinthesewoefullylimited wordshowmuchIhavecherishedherloveandsupport. Shehasbeenbymysidethrough v thetoughestoftimes,providingthestrengthtosuccessfullyperseverethroughgradschool. Withoutherpatienceandunflaggingfaithinmyabilities,thisdissertationwouldneverhave materialized. vi TABLEOFCONTENTS LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 SelectionofCores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 StatisticalSimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 ApplicationSteering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter3 EvaluationMethodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 DesignSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 RealisticPruningoftheDesignSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 FabSim: TheCycle-accurateFabScalarSimulator . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 CanonicalInterfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.2 ModelingofClockedStructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.3 Modelingpipelinedepth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.4 SimulatorOptimizationsandFeatures . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.5 FabSim/RTLco-simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.6 ValidationofFabSim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter4 HeterogeneousMulti-coreDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1 G21-TheProposedWorkload-agnosticHeterogeneousMulti-core. . . . . . . . . 33 4.2 PerformanceAnalysisofG21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 OtherWorkload-agnosticDesignApproaches . . . . . . . . . . . . . . . . . . . . . . . . 39 vii
Description: