Table Of ContentScheduling and Optimal Register Placement
for Synchronous Circuits Derived Using
Software Pipelining Techniques
NOUREDDINECHABINI
RoyalMilitaryCollegeofCanada
ELMOSTAPHAABOULHAMID
Universite´ deMontre´al
ISMA¨ILCHABINI
MassachusettsInstituteofTechnology
and
YVONSAVARIA
E´colePolytechniquedeMontre´al
DatadependencyconstraintsconstitutealowerboundPontheminimalclockperiodofsingle-phase
clocked sequential circuits. In contrast to methods based on basic retiming, clocked sequential
circuits with clock period P can always be obtained using software pipelining techniques. Such
circuitscanbederivedbyanymethodthatcanbeframedinthefollowingfour-stepprocess:Step1,
determineP;Step2,computeavalidperiodicscheduleofthecomputationalelements;Step3,place
registersbacktothecircuit;Step4,assigntheclocksignalstocontrolregisters.
Methodswithpolynomialrun-timetoimplementthisprocessareproposedintheliterature.
Theyimplementthesestepssequentially,startingwithStep1.Thesemethodsdonotknowhow
to optimally place registers which leads to an unnecessary number of registers. In this article,
weaddresstheproblemofhowtosimultaneouslyimplementSteps2and3inordertominimize
the total number of registers. We conjecture that the problem is NP-hard in its general form.
Weformulatetheproblemforthefirsttimeintheliterature,anddeviseaMixedIntegerLinear
Program(MILP)tosolveit.FromthisMILP,wederivealinearprogramtodetermineapproximate
This research benefited from financial support from Le Fonds Nature et Technologies (Quebec,
Canada),NSF(USA),NSERC(Canada).
Authors’ addresses: N. Chabini, Department of Electrical and Computer Engineering, Royal
Military College of Canada, PO Box 17000, Station Forces, Kingston, On, Canada, K7K 7B4;
email:chabini-n@rmc.ca;E.M.Aboulhamid,DIRO,Universite´deMontre´al,C.P.6128,Suc.Centre-
ville,Montre´al,Qc,Canada,H3C3J7;email:aboulham@iro.umontreal.ca;I.Chabini,Department
of Civil and Environmental Engineering, Massachusetts Institute of Technology, Room 1-263,
Cambridge,MA,USA,02139;email:chabini@mit.edu;Y.Savaria,DepartmentofElectricalEn-
gineering,E´colePolytechniquedeMontre´al,C.P.6079,Suc.Centre-ville,Montre´al,Qc,Canada,
H3C3A7;email:savaria@vlsi.polymtl.ca.
Permissiontomakedigitalorhardcopiesofpartorallofthisworkforpersonalorclassroomuseis
grantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitordirectcommercial
advantageandthatcopiesshowthisnoticeonthefirstpageorinitialscreenofadisplayalong
withthefullcitation.CopyrightsforcomponentsofthisworkownedbyothersthanACMmustbe
honored.Abstractingwithcreditispermitted.Tocopyotherwise,torepublish,topostonservers,
toredistributetolists,ortouseanycomponentofthisworkinotherworksrequirespriorspecific
permissionand/orafee.PermissionsmayberequestedfromPublicationsDept.,ACM,Inc.,1515
Broadway,NewYork,NY10036USA,fax:+1(212)869-0481,orpermissions@acm.org.
(cid:1)C 2005ACM1084-4309/05/0400-0187$5.00
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005,Pages187–204.
188 • N.Chabinietal.
solutionstotheproblemforlargegeneralcircuits.Weshowthattheproposedapproachcanhandle
nonzeroclockskew.Experimentalresultsconfirmtheeffectivenessoftheapproachandshowthat
significantreductionsofthenumberofregisterscanbeobtainedalthoughregistersharingisnot
used.Whenthescheduleisgiven,theproposedapproachprovidessolutionstotheproblemofhow
toplacetheminimalnumberofregistersinStep3.
CategoriesandSubjectDescriptors:B.[Hardware]
GeneralTerms:Algorithms,Performance
AdditionalKeyWordsandPhrases:Retiming,softwarepipelining,multiphase,clock,sequential
circuit
1. INTRODUCTION
Data dependency constraints constitute a lower bound on the clock period of
synchronous sequential circuits. This lower bound, denoted here P, can be de-
termined by solving an instance of the well known Cost-to-Time Ratio Cycle
Problem[DasdanandGupta1998;Gerezetal.1992;Lawler1976]onthegraph
modelingthecircuit.
Basic retiming has been proposed as an optimization technique for syn-
chronous circuits [Leiserson and Saxe 1991]. This technique changes the lo-
cation of registers in the circuit in order to achieve one of the following goals:
i) minimizing the clock period, ii) minimizing the number of registers, or iii)
minimizingthenumberofregistersforatargetclockperiod.
Basic retiming [Leiserson and Saxe 1991] may fail in transforming a given
synchronoussingle-phasesequentialcircuittoanotherfunctionallyequivalent
clocked circuit with a clock period of value P. Indeed, as presented in Boyer
et al. [2001a] and Lockyear and Ebeling [1994], basic retiming can transform
the correlator circuit to another one with a minimal clock period of value 13,
but a functionally equivalent circuit of a clock period of value P = 10 can be
obtained as provided in those papers. Figure 1 presents two other circuits to
show that 1) basic retiming can fail in producing circuits with clock period of
valuePthatisduetodatadependencyconstraintsonly,and2)topresenthow
muchreductionsoftheclockperiodonecanobtainbyusingmethodsbasedon
softwarepipeliningtechniqueslikethemethodinBoyeretal.[2001a],insteadof
usingmethodsbasedonbasicretiming.ForcircuitsinFigure1,basicretiming
gives a minimal clock period of value 60 for circuit #1, and 45x for circuit #2.
FunctionallyequivalentcircuitswithclockperiodP=45canbeobtainedusing,
for instance, the method in Boyer et al. [2001a]. This reduces the clock period
by25%forcircuit#1andby((x−1)/x)forcircuit#2(forinstance,whenx=2,
thisreductionis50%).
WhenbasicretimingfailstoobtainacircuitwithclockperiodofvalueP,then
a functionally equivalent circuit with clock period of value P can be obtained
withthepenaltyofincreasingthenumberofclocks(phases),andsuchacircuitis
thencalledamultiphaseclockedsequentialcircuit.Forinstance,thecorrelator
produced in Boyer et al. [2001a] and Lockyear and Ebeling [1994] is a two-
phase circuit. Details on multiphase clocked sequential circuits can be found,
forinstance,inIshiietal.[1997]andLockyearandEbeling[1994].
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.
SchedulingandOptimalRegisterPlacementforSynchronousCircuits • 189
Fig.1. Examplestoshowthatbasicretimingcanfailinminimizingtheclockperiod.
Methods to transform single-phase clocked sequential circuits to function-
ally equivalent ones with the clock period as close as possible to P are pro-
posedinLegletal.[1997],Ishiietal.[1997],LockyearandEbeling[1994],and
Maheshwari and Sapatnekar [1999]. In Legl et al. [1997], basic retiming has
beenextendedtodealwithcircuitswhoseregistersarenotenabledatthesame
time. The idea is that registers controlled by the same phase can be moved
acrosscomputationalelements.
Itisknownthatwithlevel-sensitivestorageelements(latches),clockedcir-
cuits can be made faster and smaller [Ishii et al. 1997; Lockyear and Ebeling
1994]thanwithedgetriggeredflip-flops.InIshiietal.[1997],methodstomini-
mizetheclockperiodofmultiphaselevel-sensitiveclockedcircuitsareprovided.
Also,procedurestoderivethesekindsofcircuitsfromedge-triggeredonesare
presented. In Lockyear and Ebeling [1994] and Maheshwari and Sapatnekar
[1999],retimingwithmultiphaseclocksisproposed.Formethodsin[Lockyear
etal.1994;Maheshwarietal.1999],thephasesarefixedbeforeretimingwhich
cangiveaclockperiodofvaluePonlyifgoodphasesarechosen.
Clock skew is defined as the maximum difference of the delays from the
clocksourcetotheclock-pinsonstorageelements[Tsay1993].Clockskewcan
cause malfunction of clocked circuits. Methods to ensure zero-skew in the de-
sign are reported in Li and Jabori [1992] and Tsay [1993]. However, skews
are sometimes used as a tool to improve the performance of clocked circuits
[Fishburn1990;DeokarandSapatnekar1995;SapatnekarandDeokar1996].
In Fishburn [1990], two linear programs are presented to solve the problem
of finding skews to minimize the clock period and the problem of maximizing
skews for a target clock period. The equivalence between clock skew and re-
timing was first reported in Fishburn [1990], and a formal proof is provided
in Deokar and Sapatnekar [1995]. For the work in Deokar and Sapatnekar
[1995], a clock skew optimization problem is first solved with the objective of
minimizing the clock period. Then, the obtained skews are transformed to re-
timingbymovingsomeflip-flopsacrosscombinationalblocks.Forsingle-phase
clockedcircuits,amixedintegerlinearprogramtocombineretimingandclock
skewisdevisedinFriedmanetal.[1999]andLiuetal.[2002].Aspresentedin
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.
190 • N.Chabinietal.
Boyeretal.[2001b],thetolerancetotheclockskewforclockedcircuitscanbe
improvedbyusinglatchesinsteadofflip-flops.Thepapershowsthat,formulti-
phase clocked circuits operating at the minimal clock period P, the maximum
tolerance to clock skew is (P−D )/4, where D is the propagation delay of
max max
theslowestcomputationalelementinthecircuit.
Software pipelining is a powerful technique for increasing the instruction-
levelparallelismforparallelprocessors.Thismethodoverlapstheexecutionof
successive iterations in order to reduce the difference of their start execution
times.Foranintroductiontosoftwarepipeliningandtoitsrelatedtechniques,
thereaderisreferredtoAllanetal.[1995].
Tothebestofourknowledge,nomethodbasedonbasicretimingcanalways
transform single-phase clocked sequential circuits to functionally equivalent
clocked sequential circuits with a minimal clock period P that is due to data
dependencyconstraintsonly.Methodsbasedonsoftwarepipeliningtechniques
to obtain the latter circuits have been recently proposed [Boyer et al. 2001a,
b;Chabinietal.2001].Neitherthenumberofphasesnorthekindofmemory
elements to be used are fixed in advance in Boyer et al. [2001a] and Chabini
etal.[2001],comparedtosomepublishedapproacheslikeLockyearandEbeling
[1994]andMaheshwariandSapatnekar[1999]thatwereviewedpreviously.As
mentioned,themethodsinLockyearandEbeling[1994]andMaheshwariand
Sapatnekar[1999]canproducecircuitswithclockperiodequaltoPonlyifgood
phasesarechosen,whilethemethodsinBoyeretal.[2001a]andChabinietal.
[2001]arealwaysabletoobtaincircuitsthatoperateatP.Thelattermethods
canbeframedinthefollowingprocess.
Step1: Determine the minimal value P of the clock period due to data depen-
dencyconstraintsonly.
Step2: Computeavalidperiodicscheduleofthecomputationalelementswith
periodP.
Step3: Placeregistersinthecircuitaccordingtothecomputedschedule.
Step4: Determinethephasestocontrolregisters.
The method in Boyer et al. [2001a] implements this process sequentially,
startingfromStep1.ForStep2,onlyAsSoonAsPossible(ASAP)orAsLateAs
Possible(ALAP)schedulesarecomputed.AspresentedinChabinietal.[2001],
usingASAPorALAPschedulescanleadtocircuitswithanunnecessarynumber
ofregistersorphases.ForStep3,thismethodusesaheuristic,whichagaincan
leadtoanunnecessarynumberofregistersorphases.
The paper Chabini et al. [2001] has provided two methods with polynomial
run-time to determine schedules for reducing register requirements and the
number of required phases. Compared to Boyer et al. [2001a], these methods
proved very efficient in reducing the number of registers and the number of
requiredphases.Nevertheless,theproblemofhowtoefficientlyplaceregisters
inthecircuitisnotaddressedinChabinietal.[2001].
Forsoftwarepipelininginthecontextofloops,methodsforschedulingunder
register constraints to generate the code for parallel processors has been ex-
aminedintheliterature.But,itwasassumedthatprocessorsaresingle-phase
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.
SchedulingandOptimalRegisterPlacementforSynchronousCircuits • 191
clocked. Circuits derived from the previously described process can be multi-
phase.Hence,thesemethodscannotbeusedtosimultaneouslyimplementSteps
2and3oftheprocess.
Inthisarticle,weaddresstheproblemofhowtosimultaneouslyimplement
Steps 2 and 3 of the process in order to minimize the number of registers. We
proposethefirstformulationintheliteratureforthisproblem,fromwhichwe
deriveamixedintegerlinearprogram(MILP).Weconjecturethattheproblem
is NP-hard in its general form. Linear Programs (LPs) are solvable in poly-
nomialrun-time.FromthisMILP,wederiveanLPtodetermineapproximate
solutions to the problem for large general circuits. Furthermore, we present
how the proposed approach can handle nonzero clock skew. To test the effec-
tiveness of the approach in minimizing the number of registers, we apply the
MILPandtheLPonwell-knownbenchmarksandshowthesuperiorityofthat
approach over the method in Boyer et al. [2001a]. The assessment of the ap-
proachisalsodoneinthecaseofnonzeroclockskew,andtheobtainedresults
show the superiority of the approach over the method in Boyer et al. [2001b].
WecompareourexperimentalresultstoBoyeretal.[2001a,b]sincetothebest
of our knowledge, there are no other papers at this moment that are close to
theissueweaddresshere.
The next section gives some notations and definitions used in this article.
Section 3 introduces the mean of register placement, briefly reviews the reg-
isters placement step in the method of Boyer et al. [2001a], presents how the
phases to control registers are computed, and shows that the algorithm pro-
posed in Boyer et al. [2001a] to place registers is not exact. The problem we
addressanditsformulationarepresentedinSection4.Alinearprogramtode-
termineapproximatesolutionsforthisproblemisgiveninSection5.Section6
presentshowtheproposedapproachcanhandlenonzeroclockskewandgivesa
theoreticalresult.ExperimentalresultsareprovidedinSection7,andSection8
concludesthearticle.
2. PRELIMINARIES
2.1 DesignRepresentation
Theinputtoourapproachinthisarticleisasingle-phasesynchronoussequen-
tialcircuitastheoneinFigure2(a).AsinBoyeretal.[2001a],Maheshwariand
Sapatnekar[1997],ShenoyandRudell[1994],andLeisersonandSaxe[1991],
wemodeltheinputcircuitbyadirectedcyclicgraphG=(V,E,d,w),whereVis
thesetofcomputationalelementsinthecircuit,andEisthesetofedges,which
represent interconnections between vertices. Let N be the set of nonnegative
integers.Eachvertexv∈Vhasapropagationdelayd(v)∈Nwhichisassumed
to be fixed in this article. Each edge e , from u to v, in E is weighted with
u,v
a register count w(e ) ∈ N, representing the number of registers on the wire
u,v
betweenuandv.
AsinBoyeretal.[2001a],MaheshwariandSapatnekar[1997],Shenoyand
Rudell [1994], Leiserson and Saxe [1991], propagation delays of registers and
wires are assumed to be equal to zero. We believe that this delay model is
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.
192 • N.Chabinietal.
Fig.2. Samplecircuitanditsdirectedcyclicgraphmodel.
acceptable at the high-level abstraction of the design, but not when compu-
tational elements are, for instance, transistors. Even though we assume this
delaymodel,theproblemweaddressinthearticleisstillcomplex.
Figures2(a)and2(b)presentanexampleofasingle-phasesynchronousse-
quentialcircuitanditsdirectedcyclicgraphmodel,respectively.InFigure2(a),
large rectangles represent computational elements and small rectangles rep-
resent registers. Wires are oriented to show the propagation direction of the
signals.Thepropagationdelayofeachcomputationalelementofthiscircuitis
specifiedasalabelontheleftofeachlargerectangle.Thisexamplewillbeused
through this article, and will serve to illustrate the initial specification of the
problem to be solved. Without any optimization, the minimum clock period of
thecircuitinFigure2is80whichisequaltod(v )+d(v )+d(v ).
5 1 3
2.2 PeriodicSchedules
Wedefineaschedules[Bennour1996;Boyeretal.2001a]asafunctions:N×
V → Q, where s (v) ≡ s(n, v) denotes the schedule time of the nth iteration of
n
operationv.Inmultiphaseflip-flop-basedcircuits,thescheduletimeofoperation
visthestartexecutiontimeofv.AschedulesiscalledperiodicwithperiodP,
if:
∀n∈N,∀v∈V:sn+1(v)=sn(v)+P. (1)
Whenthereisnoresourceconstraint,aschedulesissaidtobevalidifand
only if the operations terminate before their results are needed. In this case,
wesaythatdatadependenciesaresatisfiedwhichisequivalenttothefollowing
mathematicalinequality:
∀n∈N,∀eu,v ∈E:sn+w(eu,v)(v)≥sn(u)+d(u). (2)
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.
SchedulingandOptimalRegisterPlacementforSynchronousCircuits • 193
2.3 MaximumThroughputofSynchronousSequentialCircuits
Let C be the set of directed cycles in the directed cyclic graph modeling the
circuit.Basedondatadependencyconstraintsonly,themaximumthroughput,
denoted T, is given by the following expression [Bennour 1996; Bennour and
Aboulhamid1995]:
(cid:1)(cid:1) (cid:3)(cid:4)(cid:1) (cid:3)(cid:3)
(cid:2) (cid:2)
T=Minc∈C w(eu,v) d(u) (3)
eu,v∈c ∀v∈Vandeu,v∈c
Determining the maximum throughput is a Minimal Cost-to-Time Ratio
Cycle Problem [Gerez et al. 1992; Lawler 1976]. This problem can be solved
inthegeneralcasewitharun-timeinO(|V(cid:7)E|log(|V|d ))[DasdanandGupta
max
1998;Lawler1976],wheredmax=Maxv∈V(d(v)).Apossiblemethodtosolvethis
problem is to iteratively apply Bellman-Ford’s algorithm [Cormen et al. 1990]
forlongestpathsonthegraphG =(V,E,d,w )derivedfromGbyletting:
p p
w (e )=d(u)−P·w(e ), (4)
P u,v u,v
where e ∈ E and P = 1/T. A binary search may be used to find the minimal
u,v
valueofPforwhichthereisnopositivecycleinG [Bennour1996;Bennourand
P
Aboulhamid 1995]. Without loss of generality, for circuits that do not attempt
to perform wave pipelining, we assume that P is greater than or equal to the
propagationdelayofeachcomputationalelementinthecircuit.
By applying expression (3) on the example circuit in Figure 2, the value of
P is 60. This value corresponds to the cycle defined by vertices v , v , v , and
1 2 4
v .Noticethatapplyingbasicretimingforminimalclockperiodonthatcircuit
5
leadstoalargervalueofP.Indeed,itleadstoP=70.
2.4 PeriodicScheduleforaGivenPeriod
Fromequation(1)andinequality(2),wehavethat:
∀e ∈E,s (v)−s (u)≥d(u)−P·w(e ). (5)
u,v 0 0 u,v
In the case of periodic schedules, determining a valid schedule of all the in-
stancesofeachvertexvinVisequivalenttodeterminings (v)foreachvinV,
0
which consists of finding a solution to the system of inequalities described by
(5).Tosolvethissystem,thegraphG describedintheprevioussectionmaybe
P
used. Note that ASAP and ALAP schedules are possible solutions to this sys-
tem.TofindanASAPschedule,Bellman-Ford’salgorithm[Cormenetal.1990]
forlongestpaths,fromachosenvertexv totheothervertices,maybeapplied
x
onthegraphG .FindinganALAPschedulemaybedoneasfollows.InStep1,
P
agraphG(cid:8) hastobederivedfromG byinvertingthedirectionofeachedgein
P
G . In Step 2, Bellman-Ford’s algorithm for longest paths, from the vertex v
P x
to the other vertices, has to be applied on the graph G(cid:8), where the weights of
itsedgesaredefinedbyEquation(4).Finally,inStep3,theALAPscheduleis
obtained by multiplying each result in Step 2 by −1. Relative to v = v , the
x 1
ASAPschedulesofverticesv ,v ,v ,v ,v ,andv ofthecircuitinFigure2are
1 2 3 4 5 6
0,−30,30,−10,−40,and−30,respectively.TheirALAPschedulesare0,−30,
40,−10,−40,and10,respectively.
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.
194 • N.Chabinietal.
Fig.3. Schedulegraph.
2.5 ScheduleGraph
Aperiodicschedule,withperiodP,isexpressedbyaschedulegraphG =(V,E,
s
d, T , P) [Boyer et al. 2001a]. Here V, E, and d have the same definition given
s
forthecaseofthegraphGpreviouslydefined.T :E→Qisaweightfunction
s
whichassociatestoeachedgee inEthetimedistancebetweentheschedule
u,v
timesofuandv.Mathematically,T (e )isdefinedasfollows:
s u,v
∀e ∈E,T (e )=s (v)−s (u). (6)
u,v s u,v w(eu,v) 0
BecausesisperiodicwithperiodP,Equation(6)mayberewrittenasfollows:
∀e ∈E,T (e )=s (v)−s (u)+P·w(e ). (7)
u,v s u,v 0 0 u,v
The graph G is consistent if and only if for each edge e in E, T (e ) ≥
s u,v s u,v
d(u). This is derived from Equation (2). Figure 3 shows a consistent schedule
graph,whereedgesarelabeledwithT valuesforthecircuitinFigure2,using
s
the ASAP schedule determined in Section 2.4. The weight of each arc in the
schedulegraphisintermofnumberofunitsoftime.
3. REGISTERPLACEMENTANDASSIGNMENTOFPHASES
Forcircuitsoptimizedusingbasicretiming[LeisersonandSaxe1991],registers
areplacedintheoptimizedcircuitusingthefollowingformula:
∀e ∈E,w (e )=r(v)−r(u)+w(e ),
u,v r u,v u,v
wherew (e )andw(e )are,respectively,thenumberofregistersonthearc
r u,v u,v
e , after and before retiming. r(u) is the value assigned by basic retiming to
u,v
eachcomputationalelementuinthecircuit.
Intherestofthissection,weshowhowregisterscanbeplacedandcontrolled
in circuits derived by the process we presented in Section 1. To this end, we
review the method in Boyer et al. [2001a] which is a possible implementation
of that process. The approach we are proposing in this article leads to better
implementationsoftheprocess.
For the method proposed in Boyer et al. [2001a], registers are placed back
tothecircuitbypipeliningtheschedulegraphG definedinSection2.5.Every
s
pathinG thatislongerthantheminimalclockperiodPisbrokenbyinserting
s
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.
SchedulingandOptimalRegisterPlacementforSynchronousCircuits • 195
Fig.4. PlacementandphasesofregistersusingalgorithminBoyeretal.[2001a].
registersonit.Forpathshavingalength(intermofnumberofunitsoftimes)
lessthanP,noregisterisrequiredifoperationschainingisassumed.
Forsynchronoussingle-phasesequentialcircuits,registersarecontrolledby
the same signal, called the clock. When clock skew is not supported, registers
inthatcasemustreceivetheclockatthesamemoment.Insynchronousmulti-
phasesequentialcircuits,registersarenotnecessarilycontrolledbythesame
clock.Inthiscase,theclockscanhavethesameperiodandbedefinedrelatively
toaglobalclockthatcanbeoneofthoseclocks.Eachclockisthenanoffsetof
theglobalclock.Thatoffsetiscalledthephaseintheliterature.
CircuitsderivedbytheprocesswepresentedinSection1canbemultiphase,
and all the clocks have the same period. In the case of the method in Boyer
etal.[2001a],whichisapossibleimplementationoftheprocess,onceregisters
areplaced,thephasestocontrolthemarethencomputedasfollows.Thephase
of a register on the input of a computational element v is (s (v) modulo P),
0
where s (v) is the schedule of v, and P is the minimal clock period due to data
0
dependencyconstraintsonly.
Figures 4(a) and 4(b) present the placement of registers and their phases
obtainedusingthealgorithmprovidedinBoyeretal.[2001a]toplaceregisters
using the schedule graph depicted in Figure 3. The latter graph corresponds
to the circuit in Figure 2 and is obtained as explained in Section 2.5. Data in
Figure4(c)isprovidedtoassistthereaderinterestedincomputingthephases
giveninFigure4(b).Thenumberofregistersthatareplacedinthecircuitis6,
andthenumberofphasestocontrolthemis4.
The algorithm for register placement in Boyer et al. [2001a] is not optimal
in the sense that it does not use a minimum number of registers. Indeed, for
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.
196 • N.Chabinietal.
Figure 4(a), register R can be omitted since there is no combinational path
1
longerthanPbetweenR andR .
4 5
4. PROBLEMFORMULATIONANDAPPROACHESFORITSRESOLUTION
OurfocusistosimultaneouslyrealizeSteps2and3intheprocesspresentedin
Section 1 in order to minimize the number of registers. The problem, denoted
(cid:1), we address in this article is then to determine a schedule with the mini-
mum register requirements, where the register placement is done during the
scheduledetermination.Wedonotsupportregistersharingasinthecasewhen
basicretimingisused,since,inourcase,theobtainedcircuitscanbemultiphase
clockedsequentialcircuits,and,inthiscase,registersontheoutputofacom-
putationalelementcanbesharedonlyiftheyarecontrolledbythesamephase.
However,oncetheregistersareplaced,onecanexaminethephasesofregisters
ontheoutputofeachcomputationalelementtodecidewhethertosharethem.
Letuspresenttheproblem(cid:1)inawaythatmakesiteasiertounderstandour
approachinsolvingit.AsexplainedinSection3,theplacementofregisterscon-
sistsinpipeliningtheschedulegraphtoobtainacircuitthatcanoperatewith
the minimal clock period P. Recall that in Boyer et al. [2001a] the placement
ofregistersisdoneoncethescheduleiscomputed.Ifthescheduleisgiven,the
problem(cid:1)transformsintoaproblemofpipeliningtheschedulegraph,whileus-
ingaminimalnumberofregisters.Theweightofeacharcintheschedulegraph
is given by Equation (7) (i.e., ∀e ∈E, T (e ) = s (v) − s (u) + P· w(e )).
u,v s u,v 0 0 u,v
Instead of fixing the schedule first, before pipelining the schedule graph, we
want to make the schedule a variable in the problem and then to pipeline the
resultingschedulegraph.
Weconjecturethattheproblem(cid:1)isNP-hardinitsgeneralform.Weprovide
inthissectionamathematicalformulation(MF)totheproblem.FromthisMF,
wederiveamixedintegerlinearprogram(MILP)thatcanbeusedforsolving
theproblemforspecialorsmall-sizecircuits.InSection5,wederivefromthis
MILPalinearprogramtodetermineapproximatesolutionstotheproblemfor
generallargecircuits.
BeforepresentingthedetailsrelatedtoMFandMILP,letusfirstgivesome
definitionsandnotationswhileintroducinganinformalformulationoftheprob-
lem. Figure 5 gives a portion of the schedule graph to pipeline, where i and j
aretwocomputationalelements.Unknownvariablesx denotethenumberof
i,j
registers that must be placed on the arc, e , to guarantee that the length,
i,j
l , of every path that goes to j via i is less than or equal to the minimal clock
i,j
period P. Variable l will be defined in the following. Note that as in Boyer
i,j
etal.[2001a],operationchainingisassumed,andhencenoregisterisrequired
if l ≤ P. Suppose that paths that go to j via i are already examined in order
i,j
todetermineifsomeregistersmustbeplacedonthemornot.Letm beanon-
i
negativerealnumbergreaterthanorequaltoeachremainderthatisobtained
by dividing the length of each one of those paths by P. The length l of every
i,j
path that goes to j via i is the sum of m and T (e ), where T (e ) is defined
i s i,j s i,j
byEquation(7).Variabley istheremainderofthedivisionofl byP.Were-
i,j i,j
quirethatm ≤(P−d(i))whichguaranteesthat,ifaregisterRisplacedonthe
i
ACMTransactionsonDesignAutomationofElectronicSystems,Vol.10,No.2,April2005.