ebook img

Field Programmable Gate Arrays with Hardwired Networks on Chip PDF

261 Pages·2012·2.15 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Field Programmable Gate Arrays with Hardwired Networks on Chip

Field Programmable Gate Arrays with Hardwired Networks on Chip PROEFSCHRIFT terverkrijgingvandegraadvandoctor aandeTechnischeUniversiteitDelft, opgezagvandeRectorMagnificusprof. ir. K.C.A.M.Luyben, voorzittervanhetCollegevoorPromoties, inhetopenbaarteverdedigenop dinsdag6november2012om15:00uur door MUHAMMADAQEELWAHLAH MasterofScienceinInformationTechnology PakistanInstituteofEngineeringandAppliedSciences(PIEAS) geborenteLahore,Pakistan. Ditproefschriftisgoedgekeurddoordepromotor: Prof. dr. K.G.W.Goossens Copromotor: Dr. ir. J.S.S.M.Wong Samenstellingpromotiecommissie: RectorMagnificus voorzitter Prof. dr. K.G.W.Goossens TechnischeUniversiteitEindhoven,promotor Dr. ir. J.S.S.M.Wong TechnischeUniversiteitDelft,copromotor Prof. dr. S.Pillement TechnicalUniversityofNantes,France Prof. dr.-Ing. M.Hubner Ruhr-Universitat-Bochum,Germany Prof. dr. D.Stroobandt UniversityofGent,Belgium Prof. dr. K.L.M.Bertels TechnischeUniversiteitDelft Prof. dr.ir. A.J.vanderVeen TechnischeUniversiteitDelft,reservelid ISBN:978-94-6186-066-8 Keywords: FieldProgrammableGateArrays,Hardwired,NetworksonChip Copyright⃝c 2012MuhammadAqeelWahlah All rights reserved. No part of this publication may be reproduced, stored in a retrievalsystem,ortransmitted,inanyformorbyanymeans,electronic,mechanical, photocopying,recording,orotherwise,withoutpermissionoftheauthor. PrintedinTheNetherlands Acknowledgments T odaywhenIlookback, Ifinditaveryinterestingjourneyfilledwith different emotions, i.e., joy and frustration, hope and despair, and laughterandsadness. Atthesametime,IfeelthatIamluckyenough to have some great people around, without whom the journey could not have beenpossible. Iwouldliketoexpressmygratitudetoallofthemasfollowing. First of all I would like to convey my gratitude to Kees Goossens, my pro- moterandsupervisor,forhiseruditeandinvaluablesupervisionwithsustained inspirationsandincessantmotivation. Heguidedmetoexplorethechallenging researchproblemswhilegivingmethecompleteflexibility,whichprovidedthe rationaletounleashmyingenuityandcreativityalongwithanin-depthexplo- rationofvariousresearchissues. Despitebeingabusyperson,hestillmanaged toextracttimetoprovidemewithhissufficientfeedback. Hisencouragement andmeticulousfeedbackwrappedinconstructivecriticismhelpedmetokeep theimpetusandtoremainstreamlinedontheroadofresearchthatresultedin thetriumphantcompletionofthiswork. I would also like to thank the PhD committee, i.e., Kees Goossens, Sebastien Pillment,DirkStroobandt,MichaelHubner,KoenBertels,andStephanWong forinvestingtheirprecioustimetoreadthethesisandprovidingmewiththeir valuablefeedback. IamgratefultoHigherEducationCommission(HEC)Pakistanforfinancially supporting my research work during the initial four years of my PhD that en- abledmetoworkanddoresearchintheComputerengineeringdepartmentof TechnicalUniversityofDelft,oneoftheleadinguniversitiesintheworld. I would like to pay my thanks to all of the colleagues from the Computer engineering department for their discussions and feedback. In particular, I wanttothankDr. JaeYoungHurforthemanydiscussions,motivationaltalks andvaluableguidanceduringmyfirsttwoyearsofPhD.Ialsowanttoextend my thanks to Dr. Chunyang Guo for being such a nice friend and office mate in all those PhD years. Furthermore, I want to acknowledge the support of our chair secretary Lidwina Tromp, and administrators Erik de Vries and Eef Hartmantoprovideagoodworkingenvironment. Iwouldliketopaymydeepestgratitudetomyparents(MuhammadSiddique Wahlah and Razia Sultana) and my siblings (Anwar-us-Saeed, Riffat Shahid, TasneemKhalid,MuhammadShafique,NaseemAtif),andmyin-laws(Razia 3 Naveed,AfzalNaveed,andSabaNaveed)fortheirnever-endingsupport, sin- cereprayers, andencouragementthroughoutmyPh.Dstudies. Inparticular, I am thankful and pay salute to my parents (Muhammad Siddique Wahlah and RaziaSultana)fortheirunconditionalloveandexceptionalsacrifice. Ialways foundthemstandingbesidemewheneverIneededthem. ImustsaythatIcan notthankenoughtoAlmightyAllah,Whogavemesuchgreatparents. Finally, I get to the persons who I owe the most for the completion of this journey. My wife Tahira Aqeel, who always stood beside me through this long journey. I must say that she endures all the efforts that were put in to producethethesis. Iwouldnothavereachedthispointwithoutherlovingand caring support, and I want to take this opportunity to thank her from the core ofmyheart. Ialsowanttopresentbundleofthanksandlovetomylittlethree years old princess Ayesha Aqeel, whose smile and little acts always freshens upmymindandbrightensupmydays. Moresooftenshemakesmefeelhow beautifullifecouldhavebeen,andhowmuchblessedapersonIam. Idedicatethisthesistoallofmyfamilymembers,andmyadvisorProf. Kees Goossens. 4 Field Programmable Gate Arrays with Hardwired Networks on Chip Muhammad Aqeel Wahlah Abstract T echnology down-scaling and platform-based designs have enforced a number of application and architecture trends for system-on-chip (SOC) designs. A modern SOC is now a multi-functional machine that can execute a large number of complex applications by using tens or even hundreds of intellectual properties (IPs). Meanwhile, due to a number of constraints, e.g., short time to market, fickle market demands, and high non-recurring engineering (NRE) costs to name a few, Field Programmable GateArrays(FPGAs)havegainedpopularitytoimplementSOCdesigns. The applications in an SOC can be dynamically started and stopped thus forming multipleuse-cases. TheapplicationscanalsohavediverseQuality-of-Service (QoS)constraintsrangingfromnonreal-timetosoft, firm, andhardreal-time constraints. AtthesametimetheIPcoresinanSOCareheterogenousinnature and run at diverse clock frequencies. The IPs can be microprocessors, DSP slices, memories, and ALU units, etc. The increasing number and diversity ofapplicationsandIPsrequireapowerfulonchipcommunicationarchitecture forquickintegrationandappropriateQoS.IncontemporaryFPGAstheonchip interconnectwouldbesoft,i.e.,programmedintheconfigurablefabric. The above-mentioned application and architecture trends have triggered a se- riesofproblems. (1)AnincreasingnumberofapplicationsonanFPGAoften requiresdynamicreconfigurationofanapplication,whichinturncanproduce interference with other running applications. (2) The increasing complexity of an application may mean that it can not be mapped entirely on the FPGA, which in turn can encounter loss of state of data during intra-application dy- namic partial reconfiguration. (3) The diverse natures of applications make it difficulttofulfilltheQuality-of-Serviceconstraintsofanapplication. (4)Sim- ilarly,itishardtoachieve(physical)timingclosureinan SOC,becauseofthe i increasing number and diversity of the IP cores. (5) The technology down- scalingleadstoFPGAarchitecturesthataremorepronetofaults,e.g.,config- urationmemoriesandlogicelementsinan FPGA canbestuckataparticular value. (6) Because communication architecture and IPs are both mapped as soft IPs in the same logic plane of the FPGA, their placement has many re- strictionstoallowfordynamicpartialreconfiguration. In this thesis, we aim to address the above-mentioned problems by proposing the architecture and design flow of a new FPGA. As the main contribution of the thesis, we propose the FPGA architecture with a hardwired network on chip (HWNoC), and multiple test, configuration, and functional regions (TCFRs). We call it hardwired, because the NoC in an FPGA is built in sil- icon and not by using the reconfigurable elements. By having a HWNOC we can have a globally asynchronous locally synchronous (GALS) environ- ment, which in turn ensures that data is not lost during inter-IP communi- cation. The HWNOC separates the communication and computation in two disjoint planes, which alleviates restrictions on the placement of IPs. As the second contribution of the thesis, we show how we can use the HWNOC to transportunifiedtest,configuration,andfunctionaldatatoTCFRs,fortesting, fasterconfiguration,andinterference-freecommunicationduringexecutionof applications. As the third contribution of the thesis, we demonstrate that how the proposed design flow ensures predictable application behavior by fulfill- ing the QoS constraints. We also present a 3-tier reconfiguration model that uses the HWNOC, which ensures contention-free communication at archi- tecture level, to overcome the problems of interference and state-loss during inter-application and intra-application reconfiguration respectively. Another contribution of the thesis is that it proposes a non-intrusive test methodology thatusestheHWNOCasatestaccessmechanismtotestthepresenceoffaults reliability of FPGA architecture. In other words, the proposed methodology makessurethatapplicationsarealwaysreconfiguredandexecutedonareliable regionofanFPGA,andwithouteffectingtheotherrunningapplications. ii Table of contents Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i ListofTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix ListofFigures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi ListofAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 ApplicationPointofView . . . . . . . . . . . . . . . 2 1.1.2 ArchitecturePointofView . . . . . . . . . . . . . . . 6 1.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.1 ApplicationPointofView . . . . . . . . . . . . . . . 12 1.2.2 ArchitecturePointofView . . . . . . . . . . . . . . . 13 1.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4.1 HardwiredNetworkonChip . . . . . . . . . . . . . . 19 1.4.2 DesignFlowtoBindApplicationsonFPGA . . . . . 21 1.4.3 Composable and Persistent-State Dynamic Reconfig- urationusing3-TierModel . . . . . . . . . . . . . . . 21 1.4.4 OnlineFPGATesting . . . . . . . . . . . . . . . . . . 22 1.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . 23 1.5 ProblemStatement . . . . . . . . . . . . . . . . . . . . . . . 24 1.6 ThesisOrganisation . . . . . . . . . . . . . . . . . . . . . . . 24 1.7 ThesisContributions . . . . . . . . . . . . . . . . . . . . . . 25 iii 2 BackgroundonFPGA&NetworksonChip . . . . . . . . . . . . 27 2.1 Background: FieldProgrammableGateArray . . . . . . . . . 27 2.1.1 FPGAArchitecture . . . . . . . . . . . . . . . . . . . 27 2.1.2 FPGADesignFlow . . . . . . . . . . . . . . . . . . . 32 2.2 Background: NetworksonChip . . . . . . . . . . . . . . . . 35 2.2.1 NoCArchitecture . . . . . . . . . . . . . . . . . . . . 35 2.2.2 NoCDesignFlow . . . . . . . . . . . . . . . . . . . 44 2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3 ProposedSolutionandRelatedWork . . . . . . . . . . . . . . . 47 3.1 ProposedSolution: FPGAwithHardwiredNoC . . . . . . . . 47 3.1.1 ProposedArchitecture . . . . . . . . . . . . . . . . . 47 3.1.2 ProposedDesignFlow . . . . . . . . . . . . . . . . . 50 3.2 Technique: HardwiredNetworkonChip . . . . . . . . . . . . 53 3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.3 Related Work on Conventional FPGA with Soft & HardInterconnect. . . . . . . . . . . . . . . . . . . . 55 3.2.4 PositioningwiththeStateoftheArt . . . . . . . . . . 57 3.2.5 RelatedWorkonCustomReconfigurableArchitectures 59 3.2.6 PositioningwiththeStateoftheArt . . . . . . . . . . 61 3.3 Technique: BindingofApplicationstoFPGA . . . . . . . . . 62 3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . 65 3.3.4 PositioningwiththeStateoftheArt . . . . . . . . . . 67 3.4 Technique: Composable and Persistent-State Dynamic Re- configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . 70 3.4.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . 72 3.4.4 PositioningwiththeStateoftheArt . . . . . . . . . . 74 3.5 Technique: OnlineTesting . . . . . . . . . . . . . . . . . . . 76 3.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . 78 3.5.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . 81 3.5.4 PositioningwiththeStateoftheArt . . . . . . . . . . 82 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 iv 4 FPGAArchitecturewithaHardwiredNetworkonChip . . . . . 85 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 HardwiredNoCArchitecture . . . . . . . . . . . . . . . . . . 88 4.3 TestConfigurationFunctionalRegionArchitecture . . . . . . 89 4.3.1 MinimumTestConfigurationRegions . . . . . . . . . 89 4.3.2 BusMacros . . . . . . . . . . . . . . . . . . . . . . . 90 4.3.3 ClockDomainCrossingFIFOs . . . . . . . . . . . . . 91 4.3.4 BitstreamManager . . . . . . . . . . . . . . . . . . . 91 4.3.5 Clock/ResetManager . . . . . . . . . . . . . . . . . 92 4.4 ControlProcessorArchitecture . . . . . . . . . . . . . . . . . 94 4.5 HardSoftPartitioning . . . . . . . . . . . . . . . . . . . . . . 97 4.5.1 HardwiredNoCPartitioning . . . . . . . . . . . . . . 97 4.5.2 TCFRPartitioning . . . . . . . . . . . . . . . . . . . 99 4.5.3 ControlProcessorPartitioning . . . . . . . . . . . . . 101 4.6 ImplementationversusModeling . . . . . . . . . . . . . . . . 101 4.6.1 HardwiredNoCImplementationversusModeling . . . 101 4.6.2 TCFRImplementationversusModeling . . . . . . . . 102 4.6.3 ControlProcessorImplementationversusModeling . . 104 4.7 HardwiredNoCExtensions . . . . . . . . . . . . . . . . . . . 104 4.7.1 Soft&MultiFPGANoC . . . . . . . . . . . . . . . . 104 4.7.2 ApplicabilityExtensions . . . . . . . . . . . . . . . . 105 4.8 ArchitecturalLimitations . . . . . . . . . . . . . . . . . . . . 106 4.9 ResultsandAnalysis . . . . . . . . . . . . . . . . . . . . . . 107 4.9.1 NetworkInterfaceVariations . . . . . . . . . . . . . . 108 4.9.2 RouterVariations . . . . . . . . . . . . . . . . . . . . 110 4.9.3 TestConfigurationFunctionalRegionVariations . . . 110 4.9.4 DesignSpaceExplorationwithConstantTCFRSize . 111 4.9.5 DesignSpaceExplorationwithVariableTCFRSize . 114 4.9.6 Area & Functional Performance Comparison of Soft &HardNoC . . . . . . . . . . . . . . . . . . . . . . 116 4.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5 PreparingtheFPGASystematCompileTime . . . . . . . . . . 121 5.1 ArchitectureandApplicationSpecifications . . . . . . . . . . 121 5.1.1 ArchitectureSpecifications . . . . . . . . . . . . . . . 121 5.1.2 ApplicationSpecifications . . . . . . . . . . . . . . . 122 5.1.3 RequiredObjectives . . . . . . . . . . . . . . . . . . 123 5.2 PUMA:(Roadto)UnifiedPlacement,Mapping,andAllocation 124 v 5.2.1 Preprocessing: DatabaseCreation . . . . . . . . . . . 126 5.2.2 TraversingtheApplicationandCreatingClusters . . . 127 5.2.3 SolutionSpaceExtraction . . . . . . . . . . . . . . . 130 5.2.4 CandidateSolutionFinding . . . . . . . . . . . . . . 133 5.2.5 SolutionConstruction . . . . . . . . . . . . . . . . . 139 5.2.6 ClusterResourceReservation . . . . . . . . . . . . . 143 5.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.4 ResultsAndAnalysis . . . . . . . . . . . . . . . . . . . . . . 144 5.4.1 Performance: SuccessRate . . . . . . . . . . . . . . . 145 5.4.2 PUMAScalability . . . . . . . . . . . . . . . . . . . 147 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6 Run-TimeFPGASystemAdaptation. . . . . . . . . . . . . . . . 149 6.1 SystemConfiguration&Programming: Overview . . . . . . . 149 6.1.1 FPGAWithSoftInterconnect . . . . . . . . . . . . . 151 6.1.2 FPGAWithHardInterconnect . . . . . . . . . . . . . 151 6.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . 153 6.2 3-Tier Model for Composable & Persistent-State Run-Time Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.2.1 ResponsibilitiesAcrossthe3Tiers . . . . . . . . . . . 153 6.2.2 EnforcingtheInter-ApplicationComposability . . . . 155 6.2.3 RunTimeApplicationReconfiguration . . . . . . . . 156 6.2.4 Assuring the Intra-Application Persistent-State Tran- sition . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . 167 6.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.4 EvaluationandResults . . . . . . . . . . . . . . . . . . . . . 168 6.4.1 Configuration,Programming,&Functional: Compar- ison . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.4.2 ConventionalandProposedArchitectureComparison forLargerSystems . . . . . . . . . . . . . . . . . . . 172 6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 7 OnlineTestingofFPGAArchitecture . . . . . . . . . . . . . . . 175 7.1 TheTestMethodology . . . . . . . . . . . . . . . . . . . . . 176 7.1.1 TCFRTesting . . . . . . . . . . . . . . . . . . . . . . 177 7.1.2 PerformHWNoCTest . . . . . . . . . . . . . . . . . 181 7.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 vi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.