ebook img

Synthesis of Asynchronous Hardware from Petri Nets PDF

57 Pages·2012·0.59 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Synthesis of Asynchronous Hardware from Petri Nets

Synthesis of Asynchronous Hardware from Petri Nets Josep Carmona1, Jordi Cortadella1, Victor Khomenko2 and Alex Yakovlev2 1 Universitat Polit(cid:18)ecnica de Catalunya, Barcelona, Spain [email protected], [email protected] 2 Universityof Newcastle, Newcastle uponTyne NE17RU, UK fVictor.Khomenko,[email protected] Abstract. Assemiconductortechnologystridestowardsbillionsoftran- sistorsonasingledie,problemsconcernedwithdeepsub-micronprocess featuresanddesignproductivitycallfornewapproachesintheareaofbe- haviouralmodels.Thispaperfocusesonsomeofrecentdevelopmentsand newopportunitiesforPetri netsindesigningasynchronouscircuitssuch as synthesis of asynchronous control circuits from large Petri nets gen- erated from front-end speci(cid:12)cations in hardware description languages. Thesenewmethodsavoidusingfullreachabilitystatespaceforlogicsyn- thesis. They include direct mapping of Petri nets to circuits, structural methodswithlinearprogramming,andsynthesisfromunfoldingpre(cid:12)xes using SATsolvers. 1 Introduction 1.1 Semiconductor Technology Progress The InternationalTechnologyRoadmap forSemiconductors (ITRS) [1] predicts the end of this decadewill be markedby the appearanceof aSystem-on-a-Chip (SoC) containing four billion 50-nm transistors that will run at 10GHz. With a steady growth of about 60% in the number of transistors per chip per year, following the famous Moore’s law, the functionality of a chip doubles every 1.5 to2 years.SuchaSoC willinevitably consistof manyseparatelytimed commu- nicating domains, regardless of whether they are internally clocked or not [1]. Built at the deep sub-micron level, where the e(cid:11)ective impact of interconnects on performance, power and reliability will continue to increase, such systems present a formidable challenge for design and test methods and tools. Thekeypointraisedinthe ITRSisthatdesign cost isthe greatestthreatto the continued phenomenal progress in microelectronics. The only way to over- come this threat is through improvingthe productivity and e(cid:14)ciency of the de- sign process, particularly by means of design automation and component reuse. The cost of design and veri(cid:12)cation of processing engines has reached the point where thousands of man-yearsare spent to a single design, yet processorsreach the market with hundreds of bugs [1]. 2 Carmona, Cortadella, Khomenko,Yakovlev 1.2 Self-timed systems and design tools Getting rid of global clocking in SoCs o(cid:11)ers potential added values, tradition- ally quoted in the literature [60]: greateroperational robustness,power savings, electro-magneticcompatibilityandself-checking.Whiletheasynchronousdesign community continues its battle for the demonstration of these features to the semiconductorindustryinvestors,theissueofdesignproductivitymaysuddenly turn the die to the right side for asynchronous design. Why? One of the important sub-problems of the productivity and reuse problem for globally clocked systems is that of timing closure. This issue arises when the overall SoC is assembled from existing parts, called Intellectual Property (IP) cores, where each part has been designed separately (perhaps even by a di(cid:11)erentmanufacturer)foracertainclockperiod,assumingthattheclocksignal is deliveredaccurately,at the sametime, to allparts of the system. Finding the common clocking mode for SoCs that are built from multiple IP cores is a very di(cid:14)cult problem to resolve. Self-timedsystems,orlessradical,globallyasynchronouslocallysynchronous (GALS) systems [11,70], are increasingly seen by industry as a natural way of composing systems from predesigned components without the necessity to solve the timing closure problem in its full complexity. As a consequence, self- timedsystemshighlightapromisingroutetosolvingtheproductivityproblemas companiesbegintorealise.But theyalsobegintorealisethatwithout investing intodesignandveri(cid:12)cationtoolsforasynchronousdesigntheabovepromisewill not materialise. For example, Philips, whose products are critical to the time- to-marketdemands, isnowthe worldleaderin the exploitationofasynchronous design principles [27]. Other microelectronics giants such as Intel, Sun, IBM and In(cid:12)neon, follow the trend and gradually allow some of their new products involve asynchronous parts. A smaller ‘market niche’ company Theseus Logic has been successful in down-streaming the results of their recent investment in asynchronousdesign methods (Null-Convention Logic) [26]. 1.3 Design (cid:13)ow problem The major obstacle now is the absence of a (cid:13)exible and e(cid:14)cient design (cid:13)ow, which must be compatible with commercialCAD tools,such asforexample the Cadencetoolkit.Alargepartofsuchadesign(cid:13)owwouldbetypicallyconcerned with mapping the logic circuit (or sometimes macro-cell) netlist onto silicon areausingplaceandroutetools.Althoughhugelyimportantthispartisoutside our present scope of interest, as it is essentially the same as in the traditional design (cid:13)ow. What we are concerned with is the stage in which the behavioural speci(cid:12)cation of a circuit is converted into the logic netlist implementation. The pragmatic approach to this stage suggests that the speci(cid:12)cation should appear in the form of a high-level Hardware Description Language (HDL). Ex- amples of such languages are the widely known Vhdl and Verilog, as well as Tangram[2]orBalsa[22]thataremorespeci(cid:12)cforasynchronousdesign.The Hardware SynthesiswithPetri Nets 3 latter are based on the concepts of processes, channels and variables, similar to Hoare’s CSP. We can in principle be motivated by the success of behavioural synthesis achievedbysynchronousdesignin the 90s.However,forsynchronousdesignthe taskoftranslatinganHDLspeci(cid:12)cationtologic(see,e.g.,[47])isfairlydi(cid:11)erent from what we may expect in the asynchronouscase. Its (cid:12)rst part wasconcerned with the so-called architectural synthesis, whose goal was the construction of a register-transfer level (RTL) description. This required extracting a control and data (cid:13)ow graph (CDFG) from the HDL, and performingschedulingandallocationof dataoperationstofunctionaldatapath units in order to produce an FSM for a controller or sequencer. The FSM was then constructed using standard synchronous FSM synthesis, which generated combinational logic and rows of latches. Although some parts of architectural synthesis, such as CDFG extraction, scheduling and allocation, might stay unchanged for self-timed circuits, the de- velopment of the intermediate level, an RTL model of a sequencer, and its sub- sequent circuit implementation, would be quite di(cid:11)erent. 1.4 How can Petri net help? Two critical questions arise at this point. Firstly, what is the most adequate formal language for the intermediate (still behavioural) level description? Sec- ondly,whatshouldbetheprocedureforderivinglogicimplementationfromsuch a description? The present level of development of asynchronous design (cid:13)ow suggests the following options to answer those questions: (1) Avoid (!) answering them altogether. Instead, follow a syntax-driven translation of the HDL directly into a netlist of hardware components, called handshake circuits. This sort of silicon-compilation approach was pursued at Philips with the Tangram (cid:13)ow [2]. Many computationally hard problems in- volving global optimisation of logic were also avoided. Some local ‘peephole’ optimisation was introduced at the level of handshake circuit description. Petri netswereusedforthatintheformofSignalTransitionGraphs(STGs)andtheir composition,with subsequentsynthesis usingthe Petrify tool[52,18].Similar sortof approachiscurrentlyfollowedbythe designersof the Balsa(cid:13)ow,where the role of peephole optimisation tools is played by the FSM-based synthesis toolMinimalist[12].Theproblem with this approachisthat, whilebeing very attractive from the productivity point of view, it su(cid:11)ers from the lack of global optimisation, especially for high-speed requirements, because direct mapping of the parsing tree into a circuit structure may produce very slow control circuits. (2) Translate the HDL speci(cid:12)cation into a STG for controller part and then synthesise this it using Petrify. This approachwasemployed in [4], where the HDL was Verilog. This option was attractive because the translation of the Verilog constructs preserved the natural semantical execution order between operations(not the syntaxstructure!) andPetrify couldapplylogicoptimisa- tion at a fairly global level. If the logic synthesis stage was not constrained by 4 Carmona, Cortadella, Khomenko,Yakovlev the state space explosion inherent in Petrify, this would have been an ideal situation. However,the state space explosionbecomes a real spanner in the works,be- cause the capability of Petrify to solve the logic synthesis problem is limited bythe numberof logicsignalsin the speci(cid:12)cation.STGs involving40{50binary variables can take hours of CPU time. The size of the model is critical not only for logic minimisation but, more importantly, for solving state assignment and logic decomposition problems. The state assignment problem often arises when theSTG speci(cid:12)cationisextractedautomaticallyfromanHDL.ThisforcesPet- rify intosolvingCompleteStateCoding(CSC)usingcomputationallyintensive procedures involving calculation of regions in the reachability graph. While the logic synthesis powersof Petrify should not be underestimated, oneshouldberealisticwhere theycanbeappliede(cid:14)ciently.Thusthesolutionlies wherethedesignproductivitysimilartothatof(1)canbeachievedtogetherwith the circuit optimality o(cid:11)ered by (2). We believe that the wayto such a solution is through (cid:12)nding more e(cid:14)cient ways of logic synthesis in the framework of the design (cid:13)ow shown in Fig. 1. HDL Specification Control/data splitting Control Spec (Petri net) Datapath Spec PN to circuit synthesis Signal Refinement Data logic synthesis Control Logic Present Focus Data Logic Control & data interfacing HDL Implementation Fig.1.Design Flow with Logic Synthesisfrom Petri nets. The original HDL speci(cid:12)cation is syntactically and semantically analysed, givingrisetocontrolanddatapathspeci(cid:12)cations.Datapathcanbe synthesised using standard RTL-based (synchronous) design (cid:13)ow, applied to the main frag- ments of the data path, namely combinational logic and registers. There exist methods of converting such logic to self-timed implementations, e.g., [43]. This aspect of design is outside our scope here. The control speci(cid:12)cation is assumed Hardware SynthesiswithPetri Nets 5 to be extracted from the HDL in the form of a Petri net, which will thus act as the intermediate behavioural representation. Such an extraction is in gen- eralnon-trivialandreliesonrigoroussemanticrelationshipbetweencontrol-(cid:13)ow constructsusedin typicalbehaviouralHDLs andtheir equivalentsinPetrinets. For example, if one uses Balsa, such constructs basically include sequencing, parallelisation,two-wayand multi-wayselection, arbitrationand (forever,while and for) loops, as well as macro and procedure calls. Those can be translated intoPetrinetsquitee(cid:14)cientlyasdoneforexampleinPEP[3]forthetranslation of basic high-level programming language notation, B(PN)2, into Petri nets. 1.5 Methods for Logic Synthesis from Petri nets The question of what kind of Petrinets is appropriatefor subsequent logic syn- thesis of control depends on the method used for synthesis. Roughly, synthesis methods are split into two main categories. The (cid:12)rst category comprises tech- niques of direct mapping of Petri net constructs to logic. In various forms it appeared in [51,20,32,68,74,6,58]. In the framework of 1-safe Petri nets and speed-independent circuitsthis problem wassolvedin [68],howeveronly for au- tonomous (no inputs) speci(cid:12)cation where all operations were initiated by the control logic speci(cid:12)ed by a labelled Petri net. Another limitation was that the technique did not covernets with arbitrarydynamic con(cid:13)icts. Hollaar’sone-hot encoding method [32] allowed explicit interfacing with the environment but re- quired fundamental mode timing conditions, use of internal state variables as outputs and could not deal with con(cid:13)icts and arbitration in the speci(cid:12)cations. Patil’smethod[51]worksforthewholeclassof1-safenets.However,itproduces control circuits whose operation uses 2-phase (non-return-to-zero) signalling. Thisresultsinlowerperformancethanwhatcanbeachievedfor4-phasecircuits used in [68]. ThesecondcategoryconsiderstheSignalTransitionGraphre(cid:12)nementofthe Petri net control speci(cid:12)cation. These methods usually perform an explicit logic synthesis, by deriving Boolean equations forthe output signals of the controller usingthenotionofnextstatefunctionsobtainedfromtheSTG [14,18].Itshould be noted that sometimes the STG speci(cid:12)cation for control can be obtained di- rectly from the original speci(cid:12)cations, e.g., if those are provided in the form of Timing Diagrams. InthispaperwewillnotconcentrateontheproblemofsynthesisofPetrinets for logic synthesis of controllers and refer the reader to most recent literature, such as [4]. Our focus will be on the most recent advances in logic synthesis from Petri nets and Signal Transition Graphs. These methods try to avoid using the state space generated by the Petri net model directly. They follow two possible ap- proaches.The(cid:12)rstone,calledastructuralapproach,performsgraph-basedtrans- formations on the STG and deals with the approximated state space by means of linear algebraic representations. The second one, called an unfolding-based method, represents the state space in the form of true concurrency (or partial order) semantics provided by Petri net unfoldings. 6 Carmona, Cortadella, Khomenko,Yakovlev y := 0; loop IN OUT x:= READ(IN); filter WRITE (OUT,(x+y)=2); R R in out y := x; A A in out end loop Fig.2.High-level speci(cid:12)cation of a (cid:12)lter. The remaining structure of the paper is as follows. Section 2 introduces the problemofsynthesisofcontrolcircuitsfromPetrinetbasedspeci(cid:12)cations.Itwill do it in an informal way by considering two characteristic examples of control logictobe designedbythissort of methodology.Section3providesanoverview of the traditional state-based synthesis, which is currently implemented in the Petrify tool. Section 4 describes structural methods and use of integer linear programming in logic synthesis. Section 5 presents how Petri nets unfoldings and Boolean satis(cid:12)ability problem (SAT) solvers can be used in the synthesis of asynchronous control logic. Section 6 brie(cid:13)y overviews some other related methodologiesandoutlinestheimportantcurrentandfutureresearchdirections. 2 Synthesis Problem: Simple Examples and Signal Transition Graph De(cid:12)nition We shall introduce the problem of synthesis of control circuits from Petri nets speci(cid:12)cationsusing twosimple but realistic design examples.This will also help us to presentthe twomain types of controlhardwarethat canbe designedwith themethodsdescribedinthispaper.The(cid:12)rstexample,asimpledataprocessing controller, will illustrate the design (cid:13)ow starting from an algorithmic, HDL- based,speci(cid:12)cation.Thesecondone,aninterfacecontroller,willshowthedesign startingfromawaveform,TimingDiagrambased,speci(cid:12)cation.Algorithmicand waveformspeci(cid:12)cationsaremostpopularformsofbehaviouralnotationamongst hardware designers. While describing the second example we will introduce our main speci(cid:12)cation model, Signal Transition Graph (STG). 2.1 A simple (cid:12)lter controller Weillustrateatypicaldesign(cid:13)owbymeansoftheexampleshowninFig.2.The algorithm describes a simple (cid:12)lter that reads data items from an input channel (IN) and writes the (cid:12)ltered data into an output channel (OUT) by averaging the lasttwosamples,x andy.(Notethat the (cid:12)rstoutput valueinthiscasemay be invalid and should be ignoredby the environment.)The interactionwith the environmentisasynchronous,usingafour-phaseprotocolimplementedbyapair of hRequest,Acknowledgei signals, as shown in Fig. 3. One of the possible implementations of the (cid:12)lter is depicted in the block diagramof Fig.4.Itcontainstwolevel-sensitivelatches, xandy, andoneadder Hardware SynthesiswithPetri Nets 7 DATA item i item i+1 Req Ack Fig.3. Four-phasehandshakeprotocol. (the averaging of x and y is achieved simply by a one-bit right shift of the bits of the sum x+y). Each of the components operates according to a four-phase protocol as follows: { The latches are transparent when R is high and opaque when low. A being high indicates that the data transfer through the latch has been completed. { The adder starts its operation when R goes high. After a certain delay, signalA will be asserted,indicating that the addition hasbeen (cid:12)nished and the output is valid. After that, R and A go low to complete the four-phase protocol. + OUT IN x y Rx Ax Ry Ay Ra Aa R R in control out Ain Aout Fig.4. Block diagram for the (cid:12)lter. The acknowledge signals of the latches and the adder can be implemented in many di(cid:11)erent ways, depending on how the blocks are designed. One way of doing that is by simply inserting a delay between R and A that mimics the worst-case delay of the corresponding block, as typically done for bundled-data components in micropipelines [64]. The signals hR ;A i and hR ;A i perform the synchronisation of the in in out out IN and OUT channels, respectively. R indicates the validity of IN. After A in in goeshigh,the environmentisallowedtomodifyIN. Onthe otherside,R and out A shouldbeabletocontrolalevel-sensitivelatchinasimilarwayasdescribed out above for the latches x and y. Synthesis of control The synchronisation of the functional units depicted in Fig. 4is performed by the control block, which is responsiblefor circulating the data items in the data-path in such a way that the required computations are performed as speci(cid:12)ed by the algorithm. In this paper, we use a specially interpreted Petri nets, called Signal Transi- tion Graphs (STGs), to specify the behaviour of asynchronous controllers. The 8 Carmona, Cortadella, Khomenko,Yakovlev R + R + R + R + R + in x y a out A + A + A + A + A + in x y a out R − R − R − R − R − in x y a out A − A − A − A − A − in x y a out Fig.5.Behavioural speci(cid:12)cation of the control. transitionsrepresentsignalevents(i.e.,risingorfallingedgesofsignals),whereas the arcs and places represent the causality relations among the events. Fig.5describesonepossiblebehaviourofthecontrolthatresultsinacorrect operation of the circuit. In this cases, the behaviour can be described by a marked graph, a subclass of Petri nets without choice. Marked graphs are often represented by omitting the places between transitions. Each pair of req/ack signals commit a four-phase protocol, determined by the arcs R+ ! A+ ! R(cid:0) ! A(cid:0) ! R+. The rest of the arcs are the ones that de(cid:12)ne howdata items movealongthe data-path. Forthe sakeof brevity,only a couple of them are discussed. The arc R+ !R+ indicates that the latch x can become transparent when in x there issome valid dataat the IN channel. Moreover,the data canonly be read once the latch y has captured the previous data from x. This is guaranteed by the arc A(cid:0) !R+. y x Onthe otherhand, the adderwillstartanewoperationeverytime thelatch x hasacquirednewdata.Thisisindicatedbythe arcA+ !R+.Theresult will x a besenttothe OUTchannelwhenthe additionhascompleted(arcA+ !R+ ). a out R A A R R A x x y y a a R out A out A C in R in Fig.6. Asynchronouscontroller for the (cid:12)lter. From the speci(cid:12)cation of the control, a logic circuit can be synthesised. The circuit shown in Fig. 6 has been obtained by the Petrify tool. 2.2 VME bus controller Our second example is a fragment of a VME bus slave interface [75]. It will help us to illustrate how the STG speci(cid:12)cation of an asynchronous controller Hardware SynthesiswithPetri Nets 9 can be derived from its originalTiming Diagramspeci(cid:12)cation. Fig. 7(a) depicts the interface of a circuit that controls data transfers between a VME bus and a device. The main task of the bus controller is to open and close the data transceiver through signal d according to a given protocol to read/write data from/to the device. lds+ Bus TraDnastcaeiver dsr+ ldtack+ ldtack- dsr lds d+ d Device ldtack dsr dtack- dtack+ lds- lds VME Bus d dsw dtack Controller ldtack dtack dsr- (b) (a) d- (c) Fig.7. VME bus controller: interface (a), the timing diagram for the read cycle (b) andthe STG for the read cycle (c). The input and output signals of the bus controller are as follows: { dsr and dsw are input signals that request to do a read or write operation, respectively. { dtack isanoutputsignalthatindicatesthattherequestedoperationisready to be performed. { lds is an output signal to request the device to perform a data transfer. { ldtack is an input signal coming from the device indicating that the device is ready to perform the requested data transfer. { d isanoutputsignalthatenablesthedatatransceiver.Whenhigh,thedata transceiver connects the device with the bus. The direction of the transfer (read or write) is de(cid:12)ned by the high or low level of a special (RW) signal, which is part of the address/databundle. Fig. 7(b) shows a timing diagram of the read cycle. In this case, signal dsw is always low and not depicted in the diagram. The behaviour of the controller is as follows: a request to read from the device is received by signal dsr. The controller transfers this request to the device by asserting signal lds. When the device has the data ready (ldtack high), the controller opens the transceiver to transfer data to the bus (d high). Once data has been transferred, dsr will becomelowindicatingthat the transactionmust be(cid:12)nished.Immediately after, the controller will lower signal d to isolate the device from the bus. After that, the transaction will be completed by a return-to-zero of all interface signals, seeking for a maximum parallelism between the bus and the device operations. 10 Carmona, Cortadella, Khomenko,Yakovlev Ourcontrolleralsosupportsawritecyclewith aslightlydi(cid:11)erentbehaviour. For the sake of simplicity, we have described in detail only the read cycle. The model that will be used to specify asynchronouscontrollers is based on Petri nets [53,49]. It is called Signal Transition Graph (STG) [55,13]. Roughly speaking, an STG is a formal model for timing diagrams. Now we explain how to derive an STG from a timing diagram. From Timing Diagrams to Signal Transition Graphs Atiming diagramspeci(cid:12)es the events (signal transitions) of a behaviour and their causality relations. An STG isaformalmodelforthistypeofspeci(cid:12)cations.Initssimplestform,anSTG can be considered as a causality graph in which each node represents an event andeacharcacausalityrelation.AnSTG representingthebehaviouroftheread cycle for the VME bus is shown in Fig. 7(c). Rising and falling transitions of a signal are represented by the superscripts + and (cid:0), respectively. Additionally, an STG can also model all possible dynamic behavioursof the system. This is the ro^le of the tokens held by some of the causality arcs. An event is enabled when it has at least one token on each input arc. An enabled event can (cid:12)re, which means that the event occurs. When an event (cid:12)res, a token is removed from each input arc and a token is put on each output arc. Thus, the (cid:12)ring of an event produces the enabling of anotherevent. The tokens in the speci(cid:12)cation represent the initial state of the system. The initial state in the speci(cid:12)cation of Fig. 7(c) is de(cid:12)ned by the tokens on the arcs dtack(cid:0) ! dsr+ and ldtack(cid:0) ! lds+. In this state, there is only one eventenabled,viz.dsr+.Itisaneventonaninputsignalthatmustbeproduced by the environment. The occurrenceof dsr+ removesa token from its input arc and puts a token on its output arc. In that state, the event lds+ is enabled. In thiscase,itisaneventonanoutputsignal,thatmustbeproducedbythecircuit modelled by this speci(cid:12)cation. After (cid:12)ring the sequence of events ldtack+, d+, dtack+, dsr(cid:0) and d(cid:0), two tokens are placed on the arcs d(cid:0) ! dtack(cid:0) and d(cid:0) ! lds(cid:0). In this situation, twoeventsareenabledandcan(cid:12)rein anyorderindependently fromeachother, i.e., these events are concurrent, which is naturally modelled by the STG. Choice in Signal Transition Graphs In some cases, alternative behaviours, or modes, can occur depending on how the environment interacts with the sys- tem. Inourexample,the systemwillreactdi(cid:11)erentlydependingonwhetherthe environment issues a request to read or a request to write. Typically, di(cid:11)erent behavioural modes are represented by di(cid:11)erent timing diagrams. For example, Fig. 8(a) and 8(b) depict the STGs corresponding to the read and write cycles, respectively. In these pictures, some arcs have been splitandcirclesinsertedinbetween.Thesecirclesrepresentplaces thatcanhold tokens. In fact, each arc going from one transition to another has an implicit place that holds the tokens located in that arc. By looking at the initial markings, one can observethat the transition dsr+ is enabled in the read cycle, whereas dsw+ is enabled in the write cycle. The

Description:
as synthesis of asynchronous control circuits from large Petri nets gen- Cadence toolkit. A large part of such a design flow would be typically concerned with mapping the logic circuit (or sometimes macro-cell) netlist onto silicon Proc. of AINT'00, TU Delft, The Netherlands (2000) 145–150.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.