Design of Interconnection Networks for Programmable Logic DESIGN OF INTERCONNECTION NETWORKS FOR PROGRAMMABLE LOGIC GUY LEMIEUX Assistant Professor Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC Canada DAVID LEWIS Software Architect Altera Toronto Technology Centre Altera Corporation Professor Edward S. Rogers Senior Department of Electrical and Computer Engineering University of Toronto Toronto, ON Canada .... " Springer Science+Business Media, LLC Library of Congress Cataloging-in-Publication elP info or: Title: Design of Interconnection Networks for Programmable Logic Author (s): Guy Lemieux and David Lewis ISBN 978-1-4419-5415-2 ISBN 978-1-4757-4941-0 (eBook) DOI 10.1007/978-1-4757-4941-0 Copyright © 2004 by Springer Science+Business Media New York Originally published by Springer-Verlag New York, Inc. in 2004 Softcover reprint of the hardcover 1st edition 2004 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photo-copying, microfilming, recording, or otherwise, without the prior written permission of the publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Permissions for books published in the USA: permissions@wkap. com Permissions for books.published in Europe: [email protected] Printed on acid-free paper. This book is dedicated to our families. Contents Dedication v List of Figures xi List of Tables xv List of Symbols xvii Foreword xix 1. INTRODUCTION 1 1.1 Trends in Digital Design 1 1.2 Book Outline 6 1.3 Summary 8 2. INTERCONNECTION NETWORKS 9 2.1 Multi-Stage Networks 9 2.2 PLD Routing Architectures 17 2.3 Selected Commercial PLDs 20 3. MODELS, METHODOLOGY AND CAD TOOLS 25 3.1 Mesh (Island-Style) Architecture 25 3.2 Mesh Architecture Model Details 30 3.3 Area and Delay Models 32 3.4 CAD Flow and Experimental Methodology 36 4. SPARSE CROSSBAR DESIGN 39 4.1 Introduction 39 4.2 Crossbars and Connection Blocks 40 4.3 Graph Representation 46 4.4 Evaluating RoutabiIity 47 viii INTERCONNECTION NEIWORKS FOR PROGRAMMABLE LOGIC 4.5 Routable Switch Patterns 49 4.6 Switch Placement Algorithm 53 4.7 Results 61 4.8 Design Examples 67 4.9 Conclusions 79 4.10 Future Work 79 5. SPARSE CLUSTER DESIGN 81 5.1 Introduction 81 5.2 Methodology 83 5.3 Architecture Parameters 84 5.4 Results 87 5.5 Comparison to Previous Work 98 5.6 Conclusions 99 5.7 Future Work 100 6. ROUTING SWITCH CIRCUIT DESIGN 101 6.1 Introduction 101 6.2 Methodology 104 6.3 Detailed Circuit Design 105 6.4 Fanin-Based Switches 123 6.5 Fanin-based Switch Results 125 6.6 BufferlPass Architectures 130 6.7 BufferlPass Architecture Results 134 6.8 Conclusions 137 6.9 Future Work 138 7. SWITCH BLOCK DESIGN 141 7.1 Introduction 141 7.2 Switch Blocks 143 7.3 Switch Block Design Framework 148 7.4 Fmmework Applications 154 7.5 Results 160 7.6 Conclusions 164 7.7 Future Work 165 Contents ix 8. CONCLUSIONS 167 Appendices A Switch Blocks: Reduced Flexibility 171 A.1 Introduction 171 A.2 Biased Fs = 2 Style 171 A.3 Asymmetric Fs = 2 Style 173 A.4 Results 174 A.5 Summary 176 B Switch Blocks: Diverse Design Instances 183 C VPRx: VPR Extensions 185 C.1 Determination of Router Effort 185 C.2 Routing Graph and Netlist Changes (Sparse Clusters) 187 C.3 Area and Delay Calculation Improvements 188 C.4 Runtime Improvements 190 C.5 Experimental Noise Reduction 191 C.6 Correctness Changes 192 References 193 Index 205 List of Figures 1.1 Range of area requirements for different circuits. 6 2.1 No. 5 crossbar switching network. 11 2.2 Clos network. 12 2.3 Benes network for 16 inputs and 16 outputs. 13 2.4 Recursive construction of a BeneS network. 13 2.5 Non-blocking Richards-Hwang network with full broadcast ability. 14 2.6 Partial crossbar network derived by folding a Clos network. 16 2.7 Common PLO routing network types. 18 3.1 A mesh-based (island-style) architecture model. 27 3.2 A clustered logic block (CLB). 27 3.3 A basic logic element (BLE). 28 3.4 Layout tile of a clustered logic block (CLB) and interconnect. 28 3.5 Layout design rules for a minimum-size transistor. 33 3.6 Experimental process used to evaluate PLO architectures. 37 4.1 A 6 x 4 full crossbar. 41 4.2 Examples of 6 x 4 minimal full-capacity crossbars. 41 4.3 Oru~-Huang guaranteed-capacity sparse crossbar construction. 46 4.4 Azegami guaranteed-capacity sparse crossbar construction. 47 4.5 A 6 x 4 minimal crossbar and its graph representation. 47 4.6 Routability of different switch patterns in a 80 x 12 sparse crossbar. 49 4.7 Flow network used to test the routability of a 6 x 4 min imal crossbar. 50 4.8 Overview of switch placement algorithm. 53 xii INTERCONNECTION NE1WORKS FOR PROGRAMMABLE LOGIC 4.9 Algorithm to generate uniform fanin/fanout constraints. 54 4.10 Random initial switch placement algorithm. 55 4.11 Initial switch placement algorithm using a maximum network flow algorithm. 56 4.12 Iterative optimisation of switch placement. 57 4.13 Cost computation. 58 4.14 Routability of 9 x 6 sparse crossbars with different Hamming distance profiles. 59 4.15 Example of a bad move (left) and a good move (right). 60 4.16 Find eligible switch moves. 60 4.17 The effect of routing additional signals in a 168 x 24 crossbar. 62 4.18 The effect of additional switches in a 168 x 24 crossbar. 62 4.19 Efficiency of switches in a 168 x 24 crossbar. 63 4.20 Routability of24 signals in a 168 x 24 crossbar as output wires are added. 64 4.21 Routability of a 168 x 24 crossbar after adding output wires and switches. 66 4.22 Routability of 24 signals with a fixed number of total switches. 67 4.23 Routability of 24 signals while varying total switch count. 67 4.24 Interconnect model of the Altera FLEX8000 architec- ture. 69 4.25 Routability improvements made to the FLEX8000 architecture. 70 4.26 Number of switches in highly routable Altera FLEX8000 organisations. 71 4.27 Number of transistors in highly routable Altera FLEX8000 organisations. 71 4.28 Interconnect model of the HP Plasma architecture. 72 4.29 Routability improvements made to the HP Plasma architecture. 74 4.30 Sizes of highly routable HP Plasma organisations. 74 4.31 Interconnect model of the Altera MAX7256 architec- ture. 75 4.32 Sizes of highly routable Altera MAX7256 organisations. 76 4.33 Effect of varying the cluster size N on interconnect size. 78 4.34 Effect of varying the number of top-level inputs on in- terconnect size. 78 5.1 Details of the cluster tile architecture. 84 List of Figures xiii 5.2 Fe impact on channel width. 90 5.3 Fe impact on area for cluster sizes of N = 2 and 9. 90 5.4 Best Fe for minimum area with! = l7(N + 1}/2J clusterinputs. 91 5.5 Average of total active PLD area for fully and sparsely populated clusters. 93 5.6 Spare inputs reduce channel width in fully populated clusters. 94 5.7 Delay depends on LUT size (left), but not on switch density (right). 96 5.8 Area·delay for fully-populated (left) and best-area sparse (right) clusters. 97 6.1 End-to-end connection delay using different switch types. 106 6.2 Gate boosting and level restoring both reduce leakage current. 107 6.3 The level-restoring pulldown problem. 108 6.4 Multistage buffer with (optional) tristate output. 110 6.5 HSPICE circuit model of a length-4 routing wire and all switches. 111 6.6 Adjusting the sense and drive stages of a size 6 switch. 114 6.7 Adjusting the sense and drive stages of a size 16 switch. 114 6.8 Delay per wire for various switch sizes. 116 6.9 Area·delay product per wire for various switch sizes. 117 6.10 Effect of tile length on performance of a buffer-wire connection. 120 6.11 Best switch sizes as a function of tile length (replot of Figure 6.10 data). 120 6.12 Increases due to a fixed switch size in a buffer-wire connection. 120 6.13 Effect of tile length on performance of a buffer-wire- pass-wire connection. 121 6.14 Best switch sizes as a function of tile length (replot of Figure 6.13 data). 121 6.15 Increases due to a fixed switch size in a buffer-wire- pass-wire connection. 121 6.16 Impact of slow input slew rate on delay, size 16 switch. 122 6.17 Two fanout-based (a,b) and three fanin-based (c,d,e) switch types. 124 6.18 Delay per wire under fanout, normalized to bufns, size 6 switches. 126 6.19 Key difference between two alternating schemes. 130