Table Of Content3.1 CommunicationBetweenTwoBuildingBlocks 45
sU sV sW sU
i i i i
initiatorstate
targetstate
sU sV sU
t t t
phase 1 2 3 4 1
outputdata d
valid
accept
targetregister d
t
Fig.3.13 Timingdiagram
Theinitiatorandtargetbothstartinastatethatispartofrespectivelythesuperstate
sUandsU.Thevalidandacceptcontrolsignalsaresettologic-0inthesesuper-states.
i t
ThismarksthestartofPhase1inthehandshakeprotocol.Theinitiatorsubsequently
startsPhase2oftheprotocolbytransitioningtoastateinthesuper-statesV,whereit
i
appliesthedataelementd toitsdataoutputandtogglesitsvalidoutputsignalonce.
1
Theseactionsaresynchronoustotheinitiatorclock“clk_i”.Thetargetcontinuously
samples the valid signal of the initiator using the target clock “clk_t”. When the
targetisinsuper-statesU andobservesthattheinitiatorhastoggleditsvalidsignal,
t
itrespondsbytransitioningtoastateinthesuper-statesV,whereitsamplesthedata
t
signalsinalocalregisteronceusingitslocalclocksignal.Inthesuper-statesV,the
t
targettogglesitsacceptoutputsignalonce,synchronoustoitslocalclock,tosignal
totheinitiatorthatithassampledthedata.ThisendsPhase2oftheprotocol.The
initiatorcontinuouslysamplestheacceptsignalofthetargetonitslocalclock.When
theinitiatorisinthesuper-statesV andobservesthatthetargethastoggleditsaccept
i
signal as well, it responds by transitioning to a state in the super-state sW.At this
i
pointtheinitiatorisallowedtochangethevalueonitsdataoutput.
Thetransferofanotherdataelementd occursinthesamewayasthefirstdata
2
elementd ,exceptthatthevaluesofthevalidandacceptsignalsareinverted.During
1
thissecondhandshake, theinitiatortransitionsfromsuper-statesW throughsuper-
i
state sX back to a state in super-state sU, while the target transitions from super-
i i
state sV back to a state in super-state sU, all under control of the valid and accept
t t
signals.
The four-phase handshake protocol is similarly illustrated in Figs. 3.13, 3.14,
and3.15.Theinitiatorandtargetagainstartinastatethatispartofrespectivelythe
superstate sU andsU.Thevalidandacceptcontrolsignalsaresettologic-0inthese
i t
super-states. This marks the start of Phase 1 in this protocol. The initiator subse-
quentlystartsPhase2ofthisprotocolbytransitioningtoastateinthesuper-statesV,
i
where it applies the data element d to its data output and asserts its valid output
46 3 Post-siliconDebuggingofMultipleBuildingBlocks
Fig.3.14 InitiatorSTG
-/0,-
-/0,- sU sV 0/1,d
i i
1/1,d
0/0,-
sW 1/0,-
i
Fig.3.15 TargetSTG
sU 0,-/0
t
0,-/1 1,d/0
sV 1,d/1
t
signal.Theseactionsaresynchronoustotheinitiatorclock.Thetargetcontinuously
samplesthevalidsignaloftheinitiatorusingthetargetclock.Whenthetargetisin
super-statesU andobservesthattheinitiatorhasasserteditsvalidsignal,itresponds
t
by transitioning to a state in the super-state sV. In super-state sV, it samples the
t t
datasignalsinalocalregisteronceusingitslocalclockandassertsitsacceptoutput
signaltosignaltotheinitiatorthatithassampledthedata.ThisendsPhase2ofthis
protocol.Theinitiatorcontinuouslysamplestheacceptsignalfromthetargetonits
localclock.Whentheinitiatorisinsuper-statesV andobservesthatthetargethas
i
asserteditsacceptsignal,itrespondsbytransitioningtoastateinthesuper-statesW,
i
whereitdeassertsitsvalidoutputsignalsynchronoustoitslocalclock.Atthispoint
theinitiatorisallowedtochangethevalueonitsdataoutput.ThisendsPhase3of
thisprotocol.Whenthetargetisinthesuper-statesV andobservesthedeassertion
t
ofthevalidsignalbytheinitiator,itrespondsbytransitioningtoastateinthesuper-
statesU, whereitdeassertsitsacceptoutputsignalsynchronoustoitslocalclock.
t
This ends Phase 4 of this protocol. When the initiator is in the super-state sW and
i
observesthatthetargethasdeasserteditsacceptsignal,itrespondsbytransitioning
toastateinthesuper-statesU, whereitcanoptionallystartthetransferofanother
i
dataelement.
Please note that the target STGs in Figs. 3.12 and 3.15 are simplifications of
realimplementationstohelpfocusontheoperationofthehandshakeprotocols.As
both handshake protocols allow the target to delay its acceptance of the data, the
targetSTGcanbeextendedasshowninFig.3.16.TheSTGinFig.3.16istheSTG
ofatargetimplementingthetwo-phasehandshakeprotocol,butwithtwoadditional
super-statessW andsX.Thetargetdoesnotimmediatelyacknowledgearequestfrom
t t
theinitiatorwhenitisinthesuper-statesW orsX.Theinitiatorstallsinrespectively
t t
super-statesV orsX untilthetargetreturnstorespectivelysuper-statesU orsV and
i i t t
acceptsthedata.Thesuper-statessW andsX canforexamplebeusedtoimplement
t t
theuninterruptedprocessingofpreviously-receiveddataelementsbythetarget.
These asynchronous handshake protocols do not prevent meta-stability on the
sampled valid and accept control signals. Any meta-stability on these signals are
3.1 CommunicationBetweenTwoBuildingBlocks 47
Fig.3.16 Extendedtarget
STG 0,-/0 sWt -,-/0
0,-/0 sU -,-/0
t
0,d2/1 1,d1/0
1,-/1 sVt 1,-/1
-,-/1 sX -,-/1
t
howeverguaranteedtobeeventuallyresolved,becausetheinitiatorstallsandthereby
keeps its valid and data signals stable until the target acknowledges that it has ac-
ceptedtherequestfromtheinitiator.Meta-stabilityonthehandshakecontrolsignals
therefore does not cause meta-stable data to be sampled. No meta-stability occurs
when the data signals are sampled in a register in the target, because these data
signalsareheldstablebytheinitiatorwhenthetargetsamplesthem.Consequently,
handshake-based communication techniques ensure the correct transfer of data by
design,evenwhentheclocksareasynchronous.
3.1.4 SOCCommunicationProtocols
The example in Fig. 3.8 shows a unidirectional communication link between an
initiatorandatarget. Thislinkconsistsofone signalgroupthatcomprisesavalid
handshakesignal, anaccepthandshakesignal, andasetofassociateddatasignals.
Thehandshakesignalsensurethecorrecttransferofthedatasignalswithoutmetasta-
bility problems. Modern communication protocols, such as the AXI protocol [1],
thedevicetransactionlevel(DTL)protocol [14]andtheopencoreprotocol(OCP)
[15], use a bidirectional communication link between an initiator and a target that
comprisesarequestchannelandaresponsechannel(refertoFig.3.17).Thesetwo
communicationchannelstransferthewriteandreadtransactionsbetweentheinitia-
torandthetarget.Atransactioncomprisesarequestmessagefromtheinitiatortothe
targetandanoptionalresponsemessagefromthetargettotheinitiator.Eachmessage
consistsofoneormoredataelementsthatareindividuallytransferredbetweenthe
initiator and the target using a handshake. A write transaction consists of a write
requestmessageandanoptionalwriteresponsemessage.Thewriterequestmessage
containsawritecommandelementandoneormoreelementswiththedatatowrite.
Theoptionalwriteresponsemessagecontainsawriteacknowledgeelement.Aread
transaction consists of a read request message and a read response message. The
read request message contains a read command element, while the read response
messagecontainsoneormoreelementswiththedatathatwasread.
48 3 Post-siliconDebuggingofMultipleBuildingBlocks
write transaction read transaction
write request message read request
request channel write write write write read message
(from initiator
cmd data data data cmd
to target)
response channel
(from target to write read
initiator) write response ack read response data
message message
time
= transaction = message = data element
Fig.3.17 Communicationrequestandresponsechannels,transactions,messages,anddataelements
Table3.2 MainsignalsandsignalgroupsoftheDTLcommunicationprotocol[14]
Name Sourcea Description
Systemgroup
clk S DTLclock
rst_an S AsynchronousDTLreset
Commandgroup
cmd_read I Commandreadoperation
cmd_addr I Commandaddress
cmd_block_size I Commandblocksize
cmd_rd_mask I Commandreadmask
cmd_valid I Commandvalid
cmd_accept T Commandaccept
Writegroup
wr_data I Writedata
wr_mask I Writedatabytemask
wr_last I Writelast
wr_valid I Writevalid
wr_accept T Writeaccept
Readgroup
rd_data T Readdata
rd_last T Readlast
rd_valid T Readvalid
rd_accept I Readaccept
aSsystem,Iinitiator,Ttarget
Theoperatingprinciplesoftheseprotocolsareverysimilar.Wethereforeillustrate
these principles using the DTL protocol below, because we also use this protocol
in our case study in Chap. 8. Table 3.2 gives an overview and short description of
themainsignalgroupsandsignalsinvolvedintheDTLcommunicationbetweenan
initiatorandatarget[14].Table3.2alsoliststhesourceofeachsignal.
TheDTLprotocolisasynchronouscommunicationprotocol,i.e.,itrequiresthe
initiatorandtargettobepartofthesameclockdomain.Thisallowsthemaximum
3.1 CommunicationBetweenTwoBuildingBlocks 49
clk
1
cmd_read
cmd_addr address address
cmd_rd_mask 0xF
cmd_block_size 0x02 0x00
cmd_valid
cmd_accept
2
wr_data d1 d2 d3
wr_mask mask1 mask2 mask3
wr_last
wr_valid
wr_accept
3 4
rd_data d4
rd_last
rd_valid
rd_accept
Fig.3.18 ExampleDTLwriteandreadtransactions,basedon[14]
throughput of the communication link, of one data element per clock cycle, to be
utilizedduringdatatransfers.Theprotocolhoweverstillprescribestheuseofhand-
shakecontrolsignalstocontrolthedatatransferbetweentheinitiatorandthetarget.
Thehandshakecontrolsignalsallowtheinitiatorandtargettoexecuteindependently
fromeachotherwhentheyarenotcommunicatingwitheachother.Theinitiatorwill
onlystallwhenithasdatatocommunicatetothetargetandthetargetisnotreadyto
receivethisdatayet.
The DTL protocol uses three signal groups to transfer commands and data be-
tween the initiator and the target, even though the use of a single signal group for
each channel is sufficient. SOC protocols typically use multiple signal groups to
support pipelined and concurrent transactions and thereby obtain a higher system
performance.
Figure3.18showsanexamplewritetransactionandanexamplereadtransaction
onaDTLcommunicationlink,basedontheDTLprotocolspecificationin[14].The
initiatorstartsthetransactioninbothcases.Itsendsrespectivelyawritecommand
elementorareadcommandelementwithcommandinformationtothetarget,using
50 3 Post-siliconDebuggingofMultipleBuildingBlocks
thesignalsinthecommandgroup(indicatedwith 1 and 2 inFig.3.18).Thisinfor-
mationincludesthetypeofcommand(“cmd_read”),thestartaddress(“cmd_addr”),
andtheblocksize(“cmd_block_size”).Thevalidationofthisinformationbytheini-
tiator and its subsequent acceptance by the target is indicated by respectively the
“cmd_valid” and the “cmd_accept” handshake signal. Write transfers take place
similartothetransferofthecommand, i.e., theinitiatorprovidesthedatatowrite
bymeansofthesignalsinthewritegroup(indicatedas 3 inFig.3.18).Thisinfor-
mation includes the data to write (“wr_data”), a possible byte mask (“wr_mask”),
andaflagindicatingwhetherthecurrentdataelementisthelastelementinthewrite
requestmessage(“wr_last”).Thevalidationofthewritedatabytheinitiatorandits
subsequentacceptancebythetargetisindicatedbyrespectivelythe“wr_valid”and
the“wr_accept”handshakesignal.IntheexamplewritetransactioninFig.3.18,the
command element specifies a write operation (“cmd_read=0”) and a block size of
threeelements(“cmd_block_size=2”).Thesethreedataelementsaresubsequently
transferredfromtheinitiatortothetarget.Thetransferofthelastwritedataelement
isindicatedbytheassertionofthe“wr_last”signal.
Areadresponsemessagetakesplaceintheoppositedirectioncomparedtoaread
request message, e.g., the target provides the data that was read by means of the
signalsinthereadgroup(indicatedas 4 ).Thisinformationincludesthedatathat
was read (“rd_data”) and a flag indicating whether the current data element is the
lastelementinthereadmessage(“rd_last”).Thevalidationbythetargetofthedata
readanditssubsequentacceptancebytheinitiatoriscontrolledbyrespectivelythe
“rd_valid”and“rd_accept”handshakesignals.
ToenablecommunicationbetweenasynchronousbuildingblocksusingtheDTL
communicationprotocol,anSOCdesignteamhastouseaso-calledclockdomain
crossing(CDC)module[2].ACDCmoduleconsistsoftwoasynchronousbuilding
blocksthatcommunicatewitheachotherusinganasynchronousprotocol(referto
Fig. 3.19).An initiator block in clock domain a can use the DTL interface on the
initiatorsideoftheCDCmoduletowritedataintotheCDCmodule.Atargetblock
inclockdomainbcanusetheDTLinterfaceonthetargetsideoftheCDCblockto
readthisdata.AmemoryinsidetheCDCmodulekeepstrackofthedataelements
thathavebeenwrittenbutnotyetread.
3.1.5 VariableCommunicationDuration
Inexchangeforacorrectdatatransferbetweenaninitiatorandantargetindifferent
clockdomains,asynchronouscommunicationprotocolsintroduceavariationinthe
duration of the handshakes. This is illustrated in Figs. 3.20 and 3.21 for the two-
phase,asynchronouscommunicationprotocol.InFig.3.20,ittakestwoactiveedges
on the initiator clock to complete Handshake I, measured from the active edge on
whichtheinitiatortogglesitsvalidsignaltotheactiveedgeonwhichtheinitiator
samplesatoggledacceptsignalfromthetarget. HandshakeIIhowevertakesthree
activeedgesontheinitiatorclocktocomplete.Thisdifferenceindurationiscaused
3.1 CommunicationBetweenTwoBuildingBlocks 51
clk_a clk_b
clock domain a clock domain b
CDC module
command command
group group
initiator target
write side side wtarritgeet
DTL group CDC CDC group DTL
initiator target
read read
group group
DTL asynchronous DTL
interface interface interface
Fig.3.19 BlockdiagramofaCDCmodule
Handshake I Handshake II
initiator state
valid accept valid accept
target state
t
initiator clock
target clock
valid
accept
Δ Δ
φ1 φ2
Fig.3.20 Non-determinismatclock-cyclelevelduetoclockphasedifferences
by a difference in the clock periods of the initiator and target, and by the phase
difference between the two clocks at the start of the handshakes (refer to Δ and
φ1
Δ in Fig. 3.20). The handshakes in Fig. 3.20 are examples in which the control
φ2
signalsfromtheinitiatorandthetargetaresampledwithoutmeta-stability.
Figure 3.21 shows another example handshake, where the target samples the
validcontrolsignalfromtheinitiatorwithmeta-stability,becausethevalidsignalis
assertedbytheinitiatorinthesetup-and-holdintervalaroundanactiveedgeonthe
targetclocksignal.Inthisexample,ittakesoneadditionalclockcycleofthetarget
clock to resolve this meta-stability. This causes a total duration of Handshake III
of four active edges on the initiator clock. This difference in the duration of the
handshakeisvisibleasavariablecommunicationlatencybetweenthetwobuilding
blocksinvolved.ThisdifferencehastobetakenintoaccountbytheSOCapplications.
Wediscusstheconsequencesofthisvariabledelaynext,inSects.3.2and3.3.
52 3 Post-siliconDebuggingofMultipleBuildingBlocks
Fig.3.21 Non-determinism Handshake III
atclock-cycleleveldue
tometa-stability initiator state
valid accept
target state
t
initiator clock
target clock
Fig.3.22 ExampleSOC
clock domain A clock domain B clock domain C
withthreebuildingblocks
clk_a clk_b clk_c
data_p data_c
initiator request_p shared request_c initiator
1 target 2
ack_p ack_c
producer shared memory consumer
3.2 ResourceSharingBetweenBuildingBlocks
ItiscommoninalargeSOCforasetofbuildingblockstorequireaccesstothesame
resource. For example, many of the building blocks in the SOC block diagram in
Fig.1.5needwriteandreadaccesstotheexternalmemory.Thememorycontroller,
shown at the top of Fig. 1.5, arbitrates between the write and read requests from
thesebuildingblocks,bydecidingtheorderinwhichtheyareappliedtotheoff-chip
SDRAM.TheSDRAMisasharedresourceinthisSOC.Thememorycontrolleris
thearbiterforthissharedresource.
Figure3.22showsanInitiator1andanInitiator2thatbothrequiretheservicesofa
sharedtarget.Thissharedtargetcanhoweveronlyacceptandexecuteasinglerequest
atatime.Becausetherequestscometothistargetviaseparateports,thistargethas
todecidetheorderinwhichitservicestherequestsfromthesetwoinitiators.This
processiscalledarbitration.Commonresourcearbitrationalgorithmsincludestatic,
first-in/first-out,shortestjobfirst,priority-based,andround-robinarbitration.Wedo
notexplorethesespecificalgorithmsherefurther, butinsteadanalyze thepossible
effectsthatusingaGALSdesignstylehasonthearbitrationprocess.
InaGALSSOC,therequestsignalsfrominitiatorsinotherclockdomainsfirst
need to be synchronized to the clock domain of the arbiter. This synchronization
process prevents meta-stability problems, but introduces variable latencies in the
communicationofbothrequeststothesharedtarget(refertoSect.3.1.5).Thesyn-
chronized requests are subsequently combined with the requests originating from
thearbiter’sownclockdomainandservicedintheorderdeterminedbythearbitra-
tionalgorithm.Dependingonthearbitrationalgorithmused,thearrivaltimesofthe
3.2 ResourceSharingBetweenBuildingBlocks 53
Fig.3.23 “writebeforeread” s1 s2 s3
scenario p p p
producer
s2
m s3
m
shared memory
s1
m
consumer
s1 s2 s3
c c c
t
requestsattheinputsofthearbitermayhaveanimpactontheorderinwhichthese
requestsaresubsequentlyhandled.Adifferenceinthisordermayinturninfluence
the SOC execution. Figures 3.23 and 3.24 illustrates this phenomenon for the two
initiatorsandtheirsharedtarget;Initiator1isaproducerofdataandInitiator2isa
consumerofdata.Bothinitiatorscommunicateviaseparateportstoasharedmem-
ory.Thissharedmemorycanonlyacceptandexecuteasinglerequestatatime.In
Fig. 3.23, the producer is the first initiator to start a write request, which is soon
followed by a read request of the consumer. The producer’s request is the first re-
questtoarriveatthesharedmemoryandisthereforeexecutedfirst.Afterwardsthe
requestoftheconsumerisexecutedbythesharedmemory.Anotherpossiblerequest
sequenceisshowninFig.3.24,butwithadifferenttransactionsequence.Thistime,
duetodifferentlatenciesonthecommunicationpathbetweentheproducerandthe
sharedmemory,andbetweentheconsumerandthesharedmemory,thereadrequest
oftheconsumerarrivesbeforethewriterequestoftheproducer.Thereadrequestof
theconsumeristhereforeexecutedbeforethewriterequestoftheproducer.There-
sponsemessagetotheconsumermaybedifferentfromwhatitwasinthescenarioin
Fig.3.23,becauseforexampletheconsumerrequesteddatafromthesharedmemory
beforetheproducerwasabletowriteit.Thisisreflectedinthestateoftheshared
memory, when the read request of the consumer comes in. In Fig. 3.23, this state
s1 s2 s3
p p p
producer
s4m s2m s3m
shared memory
s1
m read data
consumer
s1 s2 s4 s2 s3
c c c c c
t
Fig.3.24 “readbeforewrite,andthenreread”scenario
Description:This book describes an approach and supporting infrastructure to facilitate debugging the silicon implementation of a System-on-Chip (SOC), allowing its associated product to be introduced into the market more quickly. Readers learn step-by-step the key requirements for debugging a modern, silicon S