ebook img

ACM transactions on design automation of electronic systems (April) PDF

244 Pages·2005·4.562 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ACM transactions on design automation of electronic systems (April)

Scheduling and Optimal Register Placement for Synchronous Circuits Derived Using Software Pipelining Techniques NOUREDDINECHABINI RoyalMilitaryCollegeofCanada ELMOSTAPHAABOULHAMID Universite´ deMontre´al ISMA¨ILCHABINI MassachusettsInstituteofTechnology and YVONSAVARIA E´colePolytechniquedeMontre´al DatadependencyconstraintsconstitutealowerboundPontheminimalclockperiodofsingle-phase clocked sequential circuits. In contrast to methods based on basic retiming, clocked sequential circuits with clock period P can always be obtained using software pipelining techniques. Such circuitscanbederivedbyanymethodthatcanbeframedinthefollowingfour-stepprocess:Step1, determineP;Step2,computeavalidperiodicscheduleofthecomputationalelements;Step3,place registersbacktothecircuit;Step4,assigntheclocksignalstocontrolregisters. Methodswithpolynomialrun-timetoimplementthisprocessareproposedintheliterature. Theyimplementthesestepssequentially,startingwithStep1.Thesemethodsdonotknowhow to optimally place registers which leads to an unnecessary number of registers. In this article, weaddresstheproblemofhowtosimultaneouslyimplementSteps2and3inordertominimize the total number of registers. We conjecture that the problem is NP-hard in its general form. Weformulatetheproblemforthefirsttimeintheliterature,anddeviseaMixedIntegerLinear Program(MILP)tosolveit.FromthisMILP,wederivealinearprogramtodetermineapproximate This research benefited from financial support from Le Fonds Nature et Technologies (Quebec, Canada),NSF(USA),NSERC(Canada). Authors’ addresses: N. Chabini, Department of Electrical and Computer Engineering, Royal Military College of Canada, PO Box 17000, Station Forces, Kingston, On, Canada, K7K 7B4; email:[email protected];E.M.Aboulhamid,DIRO,Universite´deMontre´al,C.P.6128,Suc.Centre- ville,Montre´al,Qc,Canada,H3C3J7;email:[email protected];I.Chabini,Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Room 1-263, Cambridge,MA,USA,02139;email:[email protected];Y.Savaria,DepartmentofElectricalEn- gineering,E´colePolytechniquedeMontre´al,C.P.6079,Suc.Centre-ville,Montre´al,Qc,Canada, H3C3A7;email:[email protected]. Permissiontomakedigitalorhardcopiesofpartorallofthisworkforpersonalorclassroomuseis grantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitordirectcommercial advantageandthatcopiesshowthisnoticeonthefirstpageorinitialscreenofadisplayalong withthefullcitation.CopyrightsforcomponentsofthisworkownedbyothersthanACMmustbe honored.Abstractingwithcreditispermitted.Tocopyotherwise,torepublish,topostonservers, toredistributetolists,ortouseanycomponentofthisworkinotherworksrequirespriorspecific permissionand/orafee.PermissionsmayberequestedfromPublicationsDept.,ACM,Inc.,1515 Broadway,NewYork,NY10036USA,fax:+1(212)869-0481,[email protected]. (cid:1)C 2005ACM1084-4309/05/0400-0187$5.00 ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005,Pages187–204. 188 • N.Chabinietal. solutionstotheproblemforlargegeneralcircuits.Weshowthattheproposedapproachcanhandle nonzeroclockskew.Experimentalresultsconfirmtheeffectivenessoftheapproachandshowthat significantreductionsofthenumberofregisterscanbeobtainedalthoughregistersharingisnot used.Whenthescheduleisgiven,theproposedapproachprovidessolutionstotheproblemofhow toplacetheminimalnumberofregistersinStep3. CategoriesandSubjectDescriptors:B.[Hardware] GeneralTerms:Algorithms,Performance AdditionalKeyWordsandPhrases:Retiming,softwarepipelining,multiphase,clock,sequential circuit 1. INTRODUCTION Data dependency constraints constitute a lower bound on the clock period of synchronous sequential circuits. This lower bound, denoted here P, can be de- termined by solving an instance of the well known Cost-to-Time Ratio Cycle Problem[DasdanandGupta1998;Gerezetal.1992;Lawler1976]onthegraph modelingthecircuit. Basic retiming has been proposed as an optimization technique for syn- chronous circuits [Leiserson and Saxe 1991]. This technique changes the lo- cation of registers in the circuit in order to achieve one of the following goals: i) minimizing the clock period, ii) minimizing the number of registers, or iii) minimizingthenumberofregistersforatargetclockperiod. Basic retiming [Leiserson and Saxe 1991] may fail in transforming a given synchronoussingle-phasesequentialcircuittoanotherfunctionallyequivalent clocked circuit with a clock period of value P. Indeed, as presented in Boyer et al. [2001a] and Lockyear and Ebeling [1994], basic retiming can transform the correlator circuit to another one with a minimal clock period of value 13, but a functionally equivalent circuit of a clock period of value P = 10 can be obtained as provided in those papers. Figure 1 presents two other circuits to show that 1) basic retiming can fail in producing circuits with clock period of valuePthatisduetodatadependencyconstraintsonly,and2)topresenthow muchreductionsoftheclockperiodonecanobtainbyusingmethodsbasedon softwarepipeliningtechniqueslikethemethodinBoyeretal.[2001a],insteadof usingmethodsbasedonbasicretiming.ForcircuitsinFigure1,basicretiming gives a minimal clock period of value 60 for circuit #1, and 45x for circuit #2. FunctionallyequivalentcircuitswithclockperiodP=45canbeobtainedusing, for instance, the method in Boyer et al. [2001a]. This reduces the clock period by25%forcircuit#1andby((x−1)/x)forcircuit#2(forinstance,whenx=2, thisreductionis50%). WhenbasicretimingfailstoobtainacircuitwithclockperiodofvalueP,then a functionally equivalent circuit with clock period of value P can be obtained withthepenaltyofincreasingthenumberofclocks(phases),andsuchacircuitis thencalledamultiphaseclockedsequentialcircuit.Forinstance,thecorrelator produced in Boyer et al. [2001a] and Lockyear and Ebeling [1994] is a two- phase circuit. Details on multiphase clocked sequential circuits can be found, forinstance,inIshiietal.[1997]andLockyearandEbeling[1994]. ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005. SchedulingandOptimalRegisterPlacementforSynchronousCircuits • 189 Fig.1. Examplestoshowthatbasicretimingcanfailinminimizingtheclockperiod. Methods to transform single-phase clocked sequential circuits to function- ally equivalent ones with the clock period as close as possible to P are pro- posedinLegletal.[1997],Ishiietal.[1997],LockyearandEbeling[1994],and Maheshwari and Sapatnekar [1999]. In Legl et al. [1997], basic retiming has beenextendedtodealwithcircuitswhoseregistersarenotenabledatthesame time. The idea is that registers controlled by the same phase can be moved acrosscomputationalelements. Itisknownthatwithlevel-sensitivestorageelements(latches),clockedcir- cuits can be made faster and smaller [Ishii et al. 1997; Lockyear and Ebeling 1994]thanwithedgetriggeredflip-flops.InIshiietal.[1997],methodstomini- mizetheclockperiodofmultiphaselevel-sensitiveclockedcircuitsareprovided. Also,procedurestoderivethesekindsofcircuitsfromedge-triggeredonesare presented. In Lockyear and Ebeling [1994] and Maheshwari and Sapatnekar [1999],retimingwithmultiphaseclocksisproposed.Formethodsin[Lockyear etal.1994;Maheshwarietal.1999],thephasesarefixedbeforeretimingwhich cangiveaclockperiodofvaluePonlyifgoodphasesarechosen. Clock skew is defined as the maximum difference of the delays from the clocksourcetotheclock-pinsonstorageelements[Tsay1993].Clockskewcan cause malfunction of clocked circuits. Methods to ensure zero-skew in the de- sign are reported in Li and Jabori [1992] and Tsay [1993]. However, skews are sometimes used as a tool to improve the performance of clocked circuits [Fishburn1990;DeokarandSapatnekar1995;SapatnekarandDeokar1996]. In Fishburn [1990], two linear programs are presented to solve the problem of finding skews to minimize the clock period and the problem of maximizing skews for a target clock period. The equivalence between clock skew and re- timing was first reported in Fishburn [1990], and a formal proof is provided in Deokar and Sapatnekar [1995]. For the work in Deokar and Sapatnekar [1995], a clock skew optimization problem is first solved with the objective of minimizing the clock period. Then, the obtained skews are transformed to re- timingbymovingsomeflip-flopsacrosscombinationalblocks.Forsingle-phase clockedcircuits,amixedintegerlinearprogramtocombineretimingandclock skewisdevisedinFriedmanetal.[1999]andLiuetal.[2002].Aspresentedin ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005. 190 • N.Chabinietal. Boyeretal.[2001b],thetolerancetotheclockskewforclockedcircuitscanbe improvedbyusinglatchesinsteadofflip-flops.Thepapershowsthat,formulti- phase clocked circuits operating at the minimal clock period P, the maximum tolerance to clock skew is (P−D )/4, where D is the propagation delay of max max theslowestcomputationalelementinthecircuit. Software pipelining is a powerful technique for increasing the instruction- levelparallelismforparallelprocessors.Thismethodoverlapstheexecutionof successive iterations in order to reduce the difference of their start execution times.Foranintroductiontosoftwarepipeliningandtoitsrelatedtechniques, thereaderisreferredtoAllanetal.[1995]. Tothebestofourknowledge,nomethodbasedonbasicretimingcanalways transform single-phase clocked sequential circuits to functionally equivalent clocked sequential circuits with a minimal clock period P that is due to data dependencyconstraintsonly.Methodsbasedonsoftwarepipeliningtechniques to obtain the latter circuits have been recently proposed [Boyer et al. 2001a, b;Chabinietal.2001].Neitherthenumberofphasesnorthekindofmemory elements to be used are fixed in advance in Boyer et al. [2001a] and Chabini etal.[2001],comparedtosomepublishedapproacheslikeLockyearandEbeling [1994]andMaheshwariandSapatnekar[1999]thatwereviewedpreviously.As mentioned,themethodsinLockyearandEbeling[1994]andMaheshwariand Sapatnekar[1999]canproducecircuitswithclockperiodequaltoPonlyifgood phasesarechosen,whilethemethodsinBoyeretal.[2001a]andChabinietal. [2001]arealwaysabletoobtaincircuitsthatoperateatP.Thelattermethods canbeframedinthefollowingprocess. Step1: Determine the minimal value P of the clock period due to data depen- dencyconstraintsonly. Step2: Computeavalidperiodicscheduleofthecomputationalelementswith periodP. Step3: Placeregistersinthecircuitaccordingtothecomputedschedule. Step4: Determinethephasestocontrolregisters. The method in Boyer et al. [2001a] implements this process sequentially, startingfromStep1.ForStep2,onlyAsSoonAsPossible(ASAP)orAsLateAs Possible(ALAP)schedulesarecomputed.AspresentedinChabinietal.[2001], usingASAPorALAPschedulescanleadtocircuitswithanunnecessarynumber ofregistersorphases.ForStep3,thismethodusesaheuristic,whichagaincan leadtoanunnecessarynumberofregistersorphases. The paper Chabini et al. [2001] has provided two methods with polynomial run-time to determine schedules for reducing register requirements and the number of required phases. Compared to Boyer et al. [2001a], these methods proved very efficient in reducing the number of registers and the number of requiredphases.Nevertheless,theproblemofhowtoefficientlyplaceregisters inthecircuitisnotaddressedinChabinietal.[2001]. Forsoftwarepipelininginthecontextofloops,methodsforschedulingunder register constraints to generate the code for parallel processors has been ex- aminedintheliterature.But,itwasassumedthatprocessorsaresingle-phase ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005. SchedulingandOptimalRegisterPlacementforSynchronousCircuits • 191 clocked. Circuits derived from the previously described process can be multi- phase.Hence,thesemethodscannotbeusedtosimultaneouslyimplementSteps 2and3oftheprocess. Inthisarticle,weaddresstheproblemofhowtosimultaneouslyimplement Steps 2 and 3 of the process in order to minimize the number of registers. We proposethefirstformulationintheliteratureforthisproblem,fromwhichwe deriveamixedintegerlinearprogram(MILP).Weconjecturethattheproblem is NP-hard in its general form. Linear Programs (LPs) are solvable in poly- nomialrun-time.FromthisMILP,wederiveanLPtodetermineapproximate solutions to the problem for large general circuits. Furthermore, we present how the proposed approach can handle nonzero clock skew. To test the effec- tiveness of the approach in minimizing the number of registers, we apply the MILPandtheLPonwell-knownbenchmarksandshowthesuperiorityofthat approach over the method in Boyer et al. [2001a]. The assessment of the ap- proachisalsodoneinthecaseofnonzeroclockskew,andtheobtainedresults show the superiority of the approach over the method in Boyer et al. [2001b]. WecompareourexperimentalresultstoBoyeretal.[2001a,b]sincetothebest of our knowledge, there are no other papers at this moment that are close to theissueweaddresshere. The next section gives some notations and definitions used in this article. Section 3 introduces the mean of register placement, briefly reviews the reg- isters placement step in the method of Boyer et al. [2001a], presents how the phases to control registers are computed, and shows that the algorithm pro- posed in Boyer et al. [2001a] to place registers is not exact. The problem we addressanditsformulationarepresentedinSection4.Alinearprogramtode- termineapproximatesolutionsforthisproblemisgiveninSection5.Section6 presentshowtheproposedapproachcanhandlenonzeroclockskewandgivesa theoreticalresult.ExperimentalresultsareprovidedinSection7,andSection8 concludesthearticle. 2. PRELIMINARIES 2.1 DesignRepresentation Theinputtoourapproachinthisarticleisasingle-phasesynchronoussequen- tialcircuitastheoneinFigure2(a).AsinBoyeretal.[2001a],Maheshwariand Sapatnekar[1997],ShenoyandRudell[1994],andLeisersonandSaxe[1991], wemodeltheinputcircuitbyadirectedcyclicgraphG=(V,E,d,w),whereVis thesetofcomputationalelementsinthecircuit,andEisthesetofedges,which represent interconnections between vertices. Let N be the set of nonnegative integers.Eachvertexv∈Vhasapropagationdelayd(v)∈Nwhichisassumed to be fixed in this article. Each edge e , from u to v, in E is weighted with u,v a register count w(e ) ∈ N, representing the number of registers on the wire u,v betweenuandv. AsinBoyeretal.[2001a],MaheshwariandSapatnekar[1997],Shenoyand Rudell [1994], Leiserson and Saxe [1991], propagation delays of registers and wires are assumed to be equal to zero. We believe that this delay model is ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005. 192 • N.Chabinietal. Fig.2. Samplecircuitanditsdirectedcyclicgraphmodel. acceptable at the high-level abstraction of the design, but not when compu- tational elements are, for instance, transistors. Even though we assume this delaymodel,theproblemweaddressinthearticleisstillcomplex. Figures2(a)and2(b)presentanexampleofasingle-phasesynchronousse- quentialcircuitanditsdirectedcyclicgraphmodel,respectively.InFigure2(a), large rectangles represent computational elements and small rectangles rep- resent registers. Wires are oriented to show the propagation direction of the signals.Thepropagationdelayofeachcomputationalelementofthiscircuitis specifiedasalabelontheleftofeachlargerectangle.Thisexamplewillbeused through this article, and will serve to illustrate the initial specification of the problem to be solved. Without any optimization, the minimum clock period of thecircuitinFigure2is80whichisequaltod(v )+d(v )+d(v ). 5 1 3 2.2 PeriodicSchedules Wedefineaschedules[Bennour1996;Boyeretal.2001a]asafunctions:N× V → Q, where s (v) ≡ s(n, v) denotes the schedule time of the nth iteration of n operationv.Inmultiphaseflip-flop-basedcircuits,thescheduletimeofoperation visthestartexecutiontimeofv.AschedulesiscalledperiodicwithperiodP, if: ∀n∈N,∀v∈V:sn+1(v)=sn(v)+P. (1) Whenthereisnoresourceconstraint,aschedulesissaidtobevalidifand only if the operations terminate before their results are needed. In this case, wesaythatdatadependenciesaresatisfiedwhichisequivalenttothefollowing mathematicalinequality: ∀n∈N,∀eu,v ∈E:sn+w(eu,v)(v)≥sn(u)+d(u). (2) ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005. SchedulingandOptimalRegisterPlacementforSynchronousCircuits • 193 2.3 MaximumThroughputofSynchronousSequentialCircuits Let C be the set of directed cycles in the directed cyclic graph modeling the circuit.Basedondatadependencyconstraintsonly,themaximumthroughput, denoted T, is given by the following expression [Bennour 1996; Bennour and Aboulhamid1995]: (cid:1)(cid:1) (cid:3)(cid:4)(cid:1) (cid:3)(cid:3) (cid:2) (cid:2) T=Minc∈C w(eu,v) d(u) (3) eu,v∈c ∀v∈Vandeu,v∈c Determining the maximum throughput is a Minimal Cost-to-Time Ratio Cycle Problem [Gerez et al. 1992; Lawler 1976]. This problem can be solved inthegeneralcasewitharun-timeinO(|V(cid:7)E|log(|V|d ))[DasdanandGupta max 1998;Lawler1976],wheredmax=Maxv∈V(d(v)).Apossiblemethodtosolvethis problem is to iteratively apply Bellman-Ford’s algorithm [Cormen et al. 1990] forlongestpathsonthegraphG =(V,E,d,w )derivedfromGbyletting: p p w (e )=d(u)−P·w(e ), (4) P u,v u,v where e ∈ E and P = 1/T. A binary search may be used to find the minimal u,v valueofPforwhichthereisnopositivecycleinG [Bennour1996;Bennourand P Aboulhamid 1995]. Without loss of generality, for circuits that do not attempt to perform wave pipelining, we assume that P is greater than or equal to the propagationdelayofeachcomputationalelementinthecircuit. By applying expression (3) on the example circuit in Figure 2, the value of P is 60. This value corresponds to the cycle defined by vertices v , v , v , and 1 2 4 v .Noticethatapplyingbasicretimingforminimalclockperiodonthatcircuit 5 leadstoalargervalueofP.Indeed,itleadstoP=70. 2.4 PeriodicScheduleforaGivenPeriod Fromequation(1)andinequality(2),wehavethat: ∀e ∈E,s (v)−s (u)≥d(u)−P·w(e ). (5) u,v 0 0 u,v In the case of periodic schedules, determining a valid schedule of all the in- stancesofeachvertexvinVisequivalenttodeterminings (v)foreachvinV, 0 which consists of finding a solution to the system of inequalities described by (5).Tosolvethissystem,thegraphG describedintheprevioussectionmaybe P used. Note that ASAP and ALAP schedules are possible solutions to this sys- tem.TofindanASAPschedule,Bellman-Ford’salgorithm[Cormenetal.1990] forlongestpaths,fromachosenvertexv totheothervertices,maybeapplied x onthegraphG .FindinganALAPschedulemaybedoneasfollows.InStep1, P agraphG(cid:8) hastobederivedfromG byinvertingthedirectionofeachedgein P G . In Step 2, Bellman-Ford’s algorithm for longest paths, from the vertex v P x to the other vertices, has to be applied on the graph G(cid:8), where the weights of itsedgesaredefinedbyEquation(4).Finally,inStep3,theALAPscheduleis obtained by multiplying each result in Step 2 by −1. Relative to v = v , the x 1 ASAPschedulesofverticesv ,v ,v ,v ,v ,andv ofthecircuitinFigure2are 1 2 3 4 5 6 0,−30,30,−10,−40,and−30,respectively.TheirALAPschedulesare0,−30, 40,−10,−40,and10,respectively. ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005. 194 • N.Chabinietal. Fig.3. Schedulegraph. 2.5 ScheduleGraph Aperiodicschedule,withperiodP,isexpressedbyaschedulegraphG =(V,E, s d, T , P) [Boyer et al. 2001a]. Here V, E, and d have the same definition given s forthecaseofthegraphGpreviouslydefined.T :E→Qisaweightfunction s whichassociatestoeachedgee inEthetimedistancebetweentheschedule u,v timesofuandv.Mathematically,T (e )isdefinedasfollows: s u,v ∀e ∈E,T (e )=s (v)−s (u). (6) u,v s u,v w(eu,v) 0 BecausesisperiodicwithperiodP,Equation(6)mayberewrittenasfollows: ∀e ∈E,T (e )=s (v)−s (u)+P·w(e ). (7) u,v s u,v 0 0 u,v The graph G is consistent if and only if for each edge e in E, T (e ) ≥ s u,v s u,v d(u). This is derived from Equation (2). Figure 3 shows a consistent schedule graph,whereedgesarelabeledwithT valuesforthecircuitinFigure2,using s the ASAP schedule determined in Section 2.4. The weight of each arc in the schedulegraphisintermofnumberofunitsoftime. 3. REGISTERPLACEMENTANDASSIGNMENTOFPHASES Forcircuitsoptimizedusingbasicretiming[LeisersonandSaxe1991],registers areplacedintheoptimizedcircuitusingthefollowingformula: ∀e ∈E,w (e )=r(v)−r(u)+w(e ), u,v r u,v u,v wherew (e )andw(e )are,respectively,thenumberofregistersonthearc r u,v u,v e , after and before retiming. r(u) is the value assigned by basic retiming to u,v eachcomputationalelementuinthecircuit. Intherestofthissection,weshowhowregisterscanbeplacedandcontrolled in circuits derived by the process we presented in Section 1. To this end, we review the method in Boyer et al. [2001a] which is a possible implementation of that process. The approach we are proposing in this article leads to better implementationsoftheprocess. For the method proposed in Boyer et al. [2001a], registers are placed back tothecircuitbypipeliningtheschedulegraphG definedinSection2.5.Every s pathinG thatislongerthantheminimalclockperiodPisbrokenbyinserting s ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005. SchedulingandOptimalRegisterPlacementforSynchronousCircuits • 195 Fig.4. PlacementandphasesofregistersusingalgorithminBoyeretal.[2001a]. registersonit.Forpathshavingalength(intermofnumberofunitsoftimes) lessthanP,noregisterisrequiredifoperationschainingisassumed. Forsynchronoussingle-phasesequentialcircuits,registersarecontrolledby the same signal, called the clock. When clock skew is not supported, registers inthatcasemustreceivetheclockatthesamemoment.Insynchronousmulti- phasesequentialcircuits,registersarenotnecessarilycontrolledbythesame clock.Inthiscase,theclockscanhavethesameperiodandbedefinedrelatively toaglobalclockthatcanbeoneofthoseclocks.Eachclockisthenanoffsetof theglobalclock.Thatoffsetiscalledthephaseintheliterature. CircuitsderivedbytheprocesswepresentedinSection1canbemultiphase, and all the clocks have the same period. In the case of the method in Boyer etal.[2001a],whichisapossibleimplementationoftheprocess,onceregisters areplaced,thephasestocontrolthemarethencomputedasfollows.Thephase of a register on the input of a computational element v is (s (v) modulo P), 0 where s (v) is the schedule of v, and P is the minimal clock period due to data 0 dependencyconstraintsonly. Figures 4(a) and 4(b) present the placement of registers and their phases obtainedusingthealgorithmprovidedinBoyeretal.[2001a]toplaceregisters using the schedule graph depicted in Figure 3. The latter graph corresponds to the circuit in Figure 2 and is obtained as explained in Section 2.5. Data in Figure4(c)isprovidedtoassistthereaderinterestedincomputingthephases giveninFigure4(b).Thenumberofregistersthatareplacedinthecircuitis6, andthenumberofphasestocontrolthemis4. The algorithm for register placement in Boyer et al. [2001a] is not optimal in the sense that it does not use a minimum number of registers. Indeed, for ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005. 196 • N.Chabinietal. Figure 4(a), register R can be omitted since there is no combinational path 1 longerthanPbetweenR andR . 4 5 4. PROBLEMFORMULATIONANDAPPROACHESFORITSRESOLUTION OurfocusistosimultaneouslyrealizeSteps2and3intheprocesspresentedin Section 1 in order to minimize the number of registers. The problem, denoted (cid:1), we address in this article is then to determine a schedule with the mini- mum register requirements, where the register placement is done during the scheduledetermination.Wedonotsupportregistersharingasinthecasewhen basicretimingisused,since,inourcase,theobtainedcircuitscanbemultiphase clockedsequentialcircuits,and,inthiscase,registersontheoutputofacom- putationalelementcanbesharedonlyiftheyarecontrolledbythesamephase. However,oncetheregistersareplaced,onecanexaminethephasesofregisters ontheoutputofeachcomputationalelementtodecidewhethertosharethem. Letuspresenttheproblem(cid:1)inawaythatmakesiteasiertounderstandour approachinsolvingit.AsexplainedinSection3,theplacementofregisterscon- sistsinpipeliningtheschedulegraphtoobtainacircuitthatcanoperatewith the minimal clock period P. Recall that in Boyer et al. [2001a] the placement ofregistersisdoneoncethescheduleiscomputed.Ifthescheduleisgiven,the problem(cid:1)transformsintoaproblemofpipeliningtheschedulegraph,whileus- ingaminimalnumberofregisters.Theweightofeacharcintheschedulegraph is given by Equation (7) (i.e., ∀e ∈E, T (e ) = s (v) − s (u) + P· w(e )). u,v s u,v 0 0 u,v Instead of fixing the schedule first, before pipelining the schedule graph, we want to make the schedule a variable in the problem and then to pipeline the resultingschedulegraph. Weconjecturethattheproblem(cid:1)isNP-hardinitsgeneralform.Weprovide inthissectionamathematicalformulation(MF)totheproblem.FromthisMF, wederiveamixedintegerlinearprogram(MILP)thatcanbeusedforsolving theproblemforspecialorsmall-sizecircuits.InSection5,wederivefromthis MILPalinearprogramtodetermineapproximatesolutionstotheproblemfor generallargecircuits. BeforepresentingthedetailsrelatedtoMFandMILP,letusfirstgivesome definitionsandnotationswhileintroducinganinformalformulationoftheprob- lem. Figure 5 gives a portion of the schedule graph to pipeline, where i and j aretwocomputationalelements.Unknownvariablesx denotethenumberof i,j registers that must be placed on the arc, e , to guarantee that the length, i,j l , of every path that goes to j via i is less than or equal to the minimal clock i,j period P. Variable l will be defined in the following. Note that as in Boyer i,j etal.[2001a],operationchainingisassumed,andhencenoregisterisrequired if l ≤ P. Suppose that paths that go to j via i are already examined in order i,j todetermineifsomeregistersmustbeplacedonthemornot.Letm beanon- i negativerealnumbergreaterthanorequaltoeachremainderthatisobtained by dividing the length of each one of those paths by P. The length l of every i,j path that goes to j via i is the sum of m and T (e ), where T (e ) is defined i s i,j s i,j byEquation(7).Variabley istheremainderofthedivisionofl byP.Were- i,j i,j quirethatm ≤(P−d(i))whichguaranteesthat,ifaregisterRisplacedonthe i ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.