ebook img

Asynchronous System-on-Chip Interconnect PDF

183 Pages·2000·3.06 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Asynchronous System-on-Chip Interconnect

Asynchronous System-on-Chip Interconnect A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science & Engineering March 2000 William John Bainbridge Department of Computer Science 1 Contents Contents ...................................................................................................................2 List of Figures .........................................................................................................6 List of Tables ...........................................................................................................8 Abstract ...................................................................................................................9 Declaration ............................................................................................................10 Copyright ...............................................................................................................10 The Author ............................................................................................................11 Acknowledgements ...............................................................................................12 Chapter 1: Introduction ....................................................................................13 1.1 Asynchronous design and its advantages ...............................................14 1.1.1 Avoidance of clock-skew .............................................................15 1.1.2 Low power ...................................................................................15 1.1.3 Improved electro-magnetic compatibility (EMC) ........................16 1.1.4 Modularity ....................................................................................16 1.1.5 Better than worst-case performance .............................................17 1.2 Disadvantages of asynchronous design ..................................................18 1.2.1 Complexity ...................................................................................18 1.2.2 Deadlock ......................................................................................18 1.2.3 Verification ..................................................................................18 1.2.4 Testability .....................................................................................19 1.2.5 “It’s not synchronous” .................................................................19 1.3 Thesis Overview .....................................................................................19 1.4 Publications ............................................................................................22 Chapter 2: Asynchronous Design .....................................................................23 2.1 Introduction ............................................................................................23 2.2 Asynchronous design ..............................................................................25 2.2.1 Circuit classification ....................................................................25 2.2.2 The channel ..................................................................................26 2.2.3 Signalling conventions .................................................................27 2.2.4 Data representation ......................................................................29 2.2.5 The Muller C-element ..................................................................32 2.2.6 Specifications and automated circuit synthesis ............................32 2.2.7 Metastability, arbitration and synchronisation .............................35 2.2.8 Sutherland’s micropipelines .........................................................36 2.2.9 Large Asynchronous Circuits .......................................................38 2.3 Summary .................................................................................................40 Chapter 3: System Level Interconnect Principles ...........................................41 3.1 Point-to-point communication paths ......................................................41 3.2 Multipoint interconnect topology ...........................................................42 3.2.1 Shared buses .................................................................................42 3.2.2 Star and Ring Networks ...............................................................42 3.2.3 Meshes .........................................................................................42 3.3 Bus protocol issues .................................................................................43 2 3.3.1 Serial operation ............................................................................44 3.3.2 Multiplexed address/data lines .....................................................44 3.3.3 Separate address and data lines ....................................................44 3.3.4 Arbitration ....................................................................................45 3.3.5 Atomic sequences ........................................................................46 3.3.6 Bursts ...........................................................................................46 3.3.7 Interlocked or decoupled transfers ...............................................46 3.3.8 Split transactions ..........................................................................46 3.4 Interconnect performance objectives ......................................................47 3.5 Commercial on-chip buses .....................................................................47 3.5.1 Peripheral Interconnect Bus (PI-Bus) ..........................................48 3.5.2 The Advanced Microcontroller Bus Architecture (AMBA) ........48 3.5.3 CoreConnect .................................................................................49 3.6 Summary .................................................................................................51 Chapter 4: The Physical (Wire) Layer .............................................................52 4.1 Wire theory .............................................................................................53 4.2 Electrical and physical characteristics ....................................................54 4.3 Termination ............................................................................................55 4.4 Crosstalk .................................................................................................55 4.4.1 Propagation delay for well separated wires .................................58 4.4.2 Signal propagation delay with close-packed wires ......................58 4.4.3 Alternative wiring arrangements ..................................................59 4.5 Summary .................................................................................................64 Chapter 5: The Link Layer ...............................................................................65 5.1 Centralised vs distributed interfaces .......................................................66 5.2 Signalling Convention ............................................................................67 5.3 Data Encoding ........................................................................................67 5.4 Handshake sources .................................................................................68 5.5 Bidirectional data transfer ......................................................................69 5.6 Multiple initiators on one channel ..........................................................70 5.6.1 Arbitration ....................................................................................71 5.6.2 Request drive and hand-over ........................................................77 5.6.3 Push data drive and hand-over .....................................................78 5.6.4 Transfer deferral/hardware retry ..................................................79 5.6.5 Atomic transfers and locking .......................................................81 5.7 Multiple Targets .....................................................................................83 5.7.1 Acknowledge drive and hand-over ..............................................83 5.7.2 Target selection ............................................................................84 5.7.3 Decode and target exceptions ......................................................85 5.7.4 Pull data drive and hand-over ......................................................85 5.7.5 Defer .............................................................................................86 5.8 Multipoint bus-channel interfaces ..........................................................86 5.9 MARBLE’s Link Layer Channels ..........................................................88 5.10 Summary ...............................................................................................89 Chapter 6: Protocol Layer .................................................................................90 6.1 Transfer phases .......................................................................................91 6.1.1 The command phase .....................................................................91 3 6.1.2 The acknowledge phase ...............................................................92 6.1.3 The data phase ..............................................................................92 6.1.4 The response phase ......................................................................92 6.2 Exceptions ..............................................................................................93 6.3 Defer and bridging ..................................................................................94 6.4 Mapping transfer phases onto channel cycles ........................................94 6.4.1 Sequential operation using a single channel ................................95 6.4.2 Parallel operation using multiple channels ..................................97 6.5 Transfer cycle routing .............................................................................98 6.5.1 Interlocked protocols ....................................................................99 6.5.2 Decoupled protocols ..................................................................100 6.6 Transfer cycle initiation ........................................................................101 6.7 MARBLE’s dual channel bus architecture ...........................................103 Chapter 7: Transaction Layer ........................................................................104 7.1 Split transactions ...................................................................................105 7.1.1 Split transactions give better bus availability ............................106 7.1.2 Implementation on an interlocked protocol layer ......................106 7.1.3 Implementation on a decoupled protocol-layer .........................108 7.2 Response ordering ................................................................................109 7.2.1 Single outstanding command .....................................................111 7.2.2 Multiple outstanding commands and pipelining ........................112 7.2.3 Number of outstanding commands ............................................113 7.2.4 A grouping of single outstanding command interfaces .............113 7.2.5 Sequence tagging and reordering of responses ..........................115 7.3 MARBLE’s Transaction Layer ............................................................118 Chapter 8: MARBLE: A dual channel split transfer bus ............................119 8.1 MARBLE protocol and signal summary ..............................................119 8.1.1 Two channels .............................................................................120 8.1.2 Split transactions ........................................................................122 8.1.3 Exceptions ..................................................................................122 8.1.4 Arbitration ..................................................................................122 8.1.5 Atomic transactions and locking ................................................123 8.1.6 Burst optimisation ......................................................................123 8.2 Bus transaction interface implementation ............................................124 8.2.1 Interface structure ......................................................................125 8.2.2 Data storage and manipulation ...................................................126 8.2.3 Token management ....................................................................129 8.3 MARBLE in the AMULET3H System ................................................130 8.3.1 The AMULET3 Processor Core ................................................131 8.3.2 RAM ...........................................................................................132 8.3.3 ROM ...........................................................................................133 8.3.4 DMA Controller .........................................................................134 8.3.5 External Memory/Test Interface ................................................134 8.3.6 ADC/AEDL ...............................................................................134 8.3.7 SOCB .........................................................................................135 8.3.8 Instruction bridge and local bus .................................................137 8.3.9 Data bridge and local bus ...........................................................138 8.4 Summary ...............................................................................................139 4 Chapter 9: Evaluation ......................................................................................141 9.1 The MARBLE testbed ..........................................................................142 9.2 Simulation of MARBLE in AMULET3H ............................................142 9.2.1 Single initiator to single target ...................................................144 9.2.2 Two initiators accessing different targets ..................................145 9.2.3 Two initiators accessing the same target ....................................146 9.2.4 Three Initiators accessing different targets ................................147 9.3 Analysis of delay distribution ...............................................................151 9.3.1 Centralised and Distributed Decoding .......................................152 9.3.2 Arbitration ..................................................................................152 9.3.3 Data drive setup time .................................................................153 9.3.4 Pipeline latch controller delays ..................................................154 9.3.5 Sender activity ............................................................................155 9.3.6 Performance summary ...............................................................155 9.4 Hardware requirements .........................................................................156 9.5 Comparison with synchronous alternatives ..........................................156 Chapter 10: Conclusion ...................................................................................158 10.1 Advantages and disadvantages of MARBLE .....................................159 10.1.1 Increased modularity ................................................................159 10.1.2 Avoidance of clock-skew .........................................................160 10.1.3 Low power consumption ..........................................................160 10.1.4 Improved electro-magnetic compatibility (EMC) ....................160 10.1.5 Performance .............................................................................161 10.1.6 Risk of deadlock .......................................................................161 10.1.7 Timing verification ..................................................................161 10.1.8 Design complexity ...................................................................162 10.1.9 Reliable arbitration ...................................................................162 10.2 Improving the MARBLE bus .............................................................163 10.2.1 Separating the read and write data paths ..................................163 10.2.2 Less conservative drive overlap prevention .............................164 10.2.3 Allowing multiple outstanding commands ..............................164 10.3 Alternative interconnect solutions and future work ...........................165 10.3.1 Changing the interconnect topology ........................................165 10.3.2 Changing to a delay-insensitive data encoding ........................166 10.4 The future of asynchronous SoC interconnect? ..................................168 Appendix A: MARBLE Schematics ...............................................................169 A1 Bus Interface Top Level Schematics ....................................................169 A2 Initiator Interface Controllers ...............................................................170 A3 Target Interface Controllers ..................................................................172 A4 Bus drivers and buffers .........................................................................173 A5 Latch controllers ...................................................................................174 A6 Centralised bus control units ................................................................176 References ........................................................................................................179 5 List of Figures 1.1 Bus interface modules 20 2.1 Channel signalling protocols 28 2.2 4-bit dual-rail completion detection logic 30 2.3 2-of-5 Threshold Gate 31 2.4 NCL Four-bit word completion detection logic 31 2.5 Muller C-elements, their function and their implemention 33 2.6 STG notation 34 2.7 STG specification for a 2 input Muller C-element 34 2.8 CMOS mutex implementation 36 2.9 Micropipeline event control modules 36 3.1 Fully interconnected bus-channel matrix 43 4.1 A layered bus hierarchy 52 4.2 Infinitessimal length wire model 53 4.3 Resistance and capacitance for nine 1mm long close packed wires 55 4.4 SPICE plot of 9-wire system 57 4.5 Signal propagation delay (transit time) for an isolated wire 59 4.6 Signal propagation delay (transit time) for a close-packed wire 60 4.7 The effect of wire formation and crosstalk on transmission delay 62 4.8 Delay versus datapath width per wire 64 5.1 A layered bus interface 65 5.2 Centralised and distributed interfaces 66 5.3 Multipoint (bus) channel wiring 68 5.4 4-phase channel with both push and pull data paths 70 5.5 Arbitrated-call 71 5.6 Hidden arbitration 72 5.7 Arbitration using cascaded mutexes 73 5.8 Ring arbiter 74 5.9 Tree arbiter element, after [46] 75 5.10 4-way tree arbiters 76 5.11 Data-path drive 79 5.12 Bridging between two bus-channels 80 5.13 Interaction of lock and defer with hidden arbitration 82 5.14 Centralised 4-phase acknowledge merge 84 5.15 Multipoint bus channel initiator interface 87 5.16 Multipoint bus channel target interface 88 6.1 Bus interface module hierarchical structure 90 6.2 Transfer Phase Combinations / Channel Mappings 95 6.3 Example of dual channel bus activity 99 6.4 Causing a command and response to overlap 100 6.5 Interlocked protocol STG 102 7.1 Bus interface modules 104 7.2 Dual-role split transaction initiator interface 107 7.3 Two decoupled commands and two responses 109 7.4 Possible order of bus cycles for two transfers 110 7.5 Bus transfer control flow 110 6 7.6 Interlocking and single outstanding command constraints 111 7.7 Adding pipelining support to a target interface 112 7.8 Four outstanding command transaction bridge 114 7.9 Adding a reorder buffer 116 7.10 Low latency FIFO buffer and reorder buffer implementation 117 8.1 Bus bridge interfaces 124 8.2 MARBLE Initiator and Target bridge structures 126 8.3 Long-hold Latch Controllers 128 8.4 The AMULET3h system 130 8.5 AMULET3 core organisation (from [4]) 131 8.6 1KB RAM block 133 8.7 SOCB timing diagram 135 8.8 Instruction Bridge and local bus 137 8.9 Data bridge and local bus 138 8.10 AMULET3H Die Plot 140 9.1 AMULET3 fetching code from the ROM 146 9.2 Trace showing interleaved CPU fetches and data accesses 147 9.3 CPU fetches and data accesses both using the ROM 148 9.4 Three initiators accessing four targets 150 9.5 Tree arbiter element delays 153 10.1 A ring network 166 10.2 A suitable layout for a 1-of-4 encoding 167 7 List of Tables 4.1 Legend codes for figures 4.7 and 4.8 61 5.1 Arbitration latency 77 6.1 Wire requirements if using only one channel for the entire transfer 96 6.2 Wire requirements if using separate channels for each cycle of a transfer 97 8.1 MARBLE Command/Address channel signals 120 8.2 MARBLE Response/Data channel signals 121 8.3 MARBLE support signals 121 8.4 MARBLE sequentiality codes 123 8.5 SOCB signals 136 9.1 Handshake phase durations for a CPU fetch from ROM (from figure 9.4) 151 9.2 Delays in driving the bus channel 153 9.3 Latch controller performance 154 9.4 MARBLE interface and bus control hardware costs 156 9.5 Bus Performance Figures 157 8 Abstract Asharedsystembusisakeyfeatureofmodernsystem-on-chipdesignmethodologies.It allowstheindependentdevelopmentofmajormacrocellswhicharethenbroughttogether inthefinalstagesofdevelopment.Theuseofasynchronousbusinasynchronousdesign brings with it problems as a result of clock-skew across the chip and the use of many timing domains in a system. In an asynchronous system, the use of a synchronous bus would subvert many of the benefits offered by asynchronous logic such as reduced electromagnetic emissions. Thisthesisdescribesanasynchronoussystem-on-chipbuswhichoffersasolutiontosuch problems. Existing shared-bus techniques are re-investigated in the context of an asynchronous implementation and a complete bus design is presented that has been developed for use in an asynchronous subsystem of a mixed-synchrony chip. This chip will imminently form part of one of the first commercially available products to incorporate components that use asynchronous VLSI techniques. The split-transfer primitive, often avoided or added as an optional extension by synchronousdesigners,isusedasthebasisforthechosenbusarchitecture.Itoffersafine- grained interleaving of bus activity and a better bus availability than would an interlocked-transfertechniqueasfoundinmanysynchronousalternatives.Thistechnique is viable in an asynchronous design because of the very low arbitration latency. Simulation results show that the proposed architecture achieves a performance comparablewithsynchronousbusesthatusesimilarlevelsofresource,whilstmaintaining the benefits of the asynchronous design style. 9 Declaration No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning. Copyright (1). CopyrightintextofthisthesisrestswiththeAuthor. Copies(byanyprocess)either in full, or of extracts, may be made only in accordance with instructions given by the Author and lodged in the John Rylands University Library of Manchester. Details may be obtained from the Librarian. This page must form part of any such copies made. Further copies (by any process) of copies made in accordance with such instructions may not be made without the permission (in writing) of the Author. (2). The ownership of any intellectual property rights which may be described in this thesis is vested in the University of Manchester, subject to any prior agreement to the contrary, and may not be made available for use by third parties without permission of the University, which will prescribe the terms and conditions of any such agreement. Furtherinformationontheconditionsunderwhichdisclosuresandexploitationmaytake place is available from the Head of the Department of Computer Science. 10

Description:
3.1 Fully interconnected bus-channel matrix. 43. 4.1 A layered bus . a member of the AMULET Group of the Department of Computer Science at the.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.