Dynamic Reconfigurable Architectures and Transparent Optimization Techniques Antonio Carlos Schneider Beck Fl. (cid:2) Luigi Carro Dynamic Reconfigurable Architectures and Transparent Optimization Techniques Automatic Acceleration of Software Execution Prof.AntonioCarlosSchneiderBeckFl. Prof.LuigiCarro InstitutodeInformática InstitutodeInformática UniversidadeFederaldoRioGrande UniversidadeFederaldoRioGrande doSul(UFRGS) doSul(UFRGS) CaixaPostal15064 CaixaPostal15064 CampusdoVale,BlocoIV CampusdoVale,BlocoIV PortoAlegre PortoAlegre Brazil Brazil [email protected] [email protected] ISBN978-90-481-3912-5 e-ISBN978-90-481-3913-2 DOI10.1007/978-90-481-3913-2 SpringerDordrechtHeidelbergLondonNewYork LibraryofCongressControlNumber:2010921831 ©SpringerScience+BusinessMediaB.V.2010 Nopartofthisworkmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorby anymeans,electronic,mechanical,photocopying,microfilming,recordingorotherwise,withoutwritten permissionfromthePublisher,withtheexceptionofanymaterialsuppliedspecificallyforthepurpose ofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthework. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) ToSabrina, for herunderstanding andsupport To AntônioandLéia, for thecontinuousencouragement To Ulisses,mayhisjourneybefullofjoy To Érika,for allour moments To Cesare, Estherand Beti,forbeingthere Preface As Moore’s law is losing steam, one already sees the phenomenon of clock fre- quencyreductioncausedbytheexcessivepowerdissipationingeneralpurposepro- cessors.Atthesametime,embeddedsystemsaregettingmoreheterogeneous,char- acterizedbyahighdiversityofcomputationalmodelscoexistinginasingledevice. Therefore,asinnovativetechnologiesthatwillcompletelyorpartiallyreplacesili- conarearising,newarchitecturalalternativesarenecessary. Althoughreconfigurablecomputinghasalreadyshowntobeapotentialsolution when it comes to accelerate specific code with a small power budget, significant speedupsareachievedjustin verydedicateddatafloworientedsoftware, failingto capture the reality of nowadays complex heterogeneous systems. Moreover, one importantcharacteristicofanynewarchitectureisthatitshouldbeabletoexecute legacycode,sincetherehasalreadybeenalargeamountofinvestmentintowriting softwarefordifferentapplications.Thewidespreadusageofreconfigurabledevices isstillwithheldbytheneedofspecialtoolsandcompilers,whichclearlypreclude reuseoflegacycodeanditsportability. Theauthorshavewrittenthisbookwiththeaforementionedlimitationsinmind. Therefore,thisbook,whichisdividedinsevenchapters,startspresentingthemain challengescomputerarchitecturesarefacingthesedays.Then,adetailedstudyon the usage of reconfigurable systems, their main principles, characteristics, poten- tial and classifications is done. A separate chapter is dedicated to present several case studies, with a critical analysis on their main advantages and drawbacks, and thebenchmarksusedfortheirevaluation.Thisanalysiswilldemonstratethatsuch architectures need to attack a diverse range of applications with very different be- haviors,besidessupportingcodecompatibility,thatis,theneedfornomodification inthesource orbinarycodes. This provesthatmore mustbedonetobringrecon- figurable computing to be used as main stream computing: dynamic optimization techniques.Therefore,binaryTranslationanddifferenttypesofreuse,withseveral examples, are evaluated. Finally, works that combine both reconfigurable systems and dynamic techniques are discussed, and a quantitative analysis of one of these examplesispresented.Thebookendswithsomedirectionsthatcouldinspirenew fieldsofresearch. vii viii Preface The main purpose of this book is to introduce reconfigurable systems and dy- namicoptimizationtechniquesto the readers, using several examples,so it can be asourceofreferencewheneverthereaderneeds.Theauthorshopeyouenjoyit,as theyhaveenjoyedmakingtheresearchthatresultedinthisbook. PortoAlegre AntonioCarlosSchneiderBeckFl. LuigiCarro Acknowledgements The authors would like to express their gratitude to the friends and colleagues at InstitutodeInformaticaofUniversidadeFederaldoRioGrandedoSul,andtogive aspecialthankstoallthepeopleintheEmbeddedSystemslaboratory,whoduring severalmomentscontributedforthisresearchformanyyears. The authors would also like to thank the Brazilian research support agencies, CAPESandCNPq. ix Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 MainMotivations . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 OvercomingSomeLimitsoftheParallelism . . . . . . . . 4 1.2.2 TakingAdvantageofCombinationalandReconfigurable Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 SoftwareCompatibilityandReuseofExistentBinaryCode 7 1.2.4 IncreasingYieldandReducingManufactureCosts . . . . . 8 1.3 ThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 ReconfigurableSystems . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 BasicPrinciples . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 ReconfigurationSteps . . . . . . . . . . . . . . . . . . . . 15 2.3 UnderlyingExecutionMechanism . . . . . . . . . . . . . . . . . 17 2.4 AdvantagesofUsingReconfigurableLogic . . . . . . . . . . . . . 20 2.4.1 Application . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.2 AnInstructionMergingExample . . . . . . . . . . . . . . 22 2.5 ReconfigurableLogicClassification . . . . . . . . . . . . . . . . . 24 2.5.1 CodeAnalysisandTransformation . . . . . . . . . . . . . 24 2.5.2 RUCoupling . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.3 Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5.4 InstructionTypes. . . . . . . . . . . . . . . . . . . . . . . 29 2.5.5 Reconfigurability . . . . . . . . . . . . . . . . . . . . . . 30 2.6 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6.1 HeterogeneousBehavioroftheApplications . . . . . . . . 31 2.6.2 PotentialforUsingFineGrainedReconfigurableArrays . . 34 2.6.3 CoarseGrainReconfigurableArchitectures . . . . . . . . . 38 2.6.4 ComparingBothGranularities. . . . . . . . . . . . . . . . 41 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 xi xii Contents 3 DeploymentofReconfigurableSystems . . . . . . . . . . . . . . . . . 45 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 ExamplesofReconfigurableArchitectures . . . . . . . . . . . . . 46 3.2.1 Chimaera . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2.2 GARP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.3 REMARC . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2.4 Rapid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2.5 Piperench(1999). . . . . . . . . . . . . . . . . . . . . . . 57 3.2.6 Molen . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.2.7 Morphosys . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.8 ADRES . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2.9 Concise . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.2.10 PACT-XPP . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2.11 RAW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2.12 Onechip . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.2.13 Chess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.2.14 PRISMI . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2.15 PRISMII. . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2.16 Nano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3 RecentDataflowArchitectures . . . . . . . . . . . . . . . . . . . 81 3.4 SummaryandComparativeTables . . . . . . . . . . . . . . . . . 83 3.4.1 OtherReconfigurableArchitectures . . . . . . . . . . . . . 83 3.4.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . 84 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4 DynamicOptimizationTechniques . . . . . . . . . . . . . . . . . . . 95 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2 BinaryTranslation . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2.1 MainMotivations . . . . . . . . . . . . . . . . . . . . . . 95 4.2.2 BasicConcepts. . . . . . . . . . . . . . . . . . . . . . . . 97 4.2.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2.4 Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3 Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3.1 InstructionReuse . . . . . . . . . . . . . . . . . . . . . . 109 4.3.2 ValuePrediction . . . . . . . . . . . . . . . . . . . . . . . 110 4.3.3 BlockReuse . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.4 TraceReuse . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.3.5 DynamicTraceMemoizationandRST . . . . . . . . . . . 114 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5 DynamicDetectionandReconfiguration . . . . . . . . . . . . . . . . 119 5.1 WarpProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.1 TheReconfigurableArray . . . . . . . . . . . . . . . . . . 120 5.1.2 HowTranslationWorks . . . . . . . . . . . . . . . . . . . 121 5.1.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Description: