AUTOMATED GENERATION OF ROUND-ROBIN ARBITRATION AND CROSSBAR SWITCH LOGIC A Thesis Presented to The Academic Faculty by Eung S. Shin In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy School of Electrical and Computer Engineering Georgia Institute of Technology November 2003 AUTOMATED GENERATION OF ROUND-ROBIN ARBITRATION AND CROSSBAR SWITCH LOGIC Approved by: Professor Vincent J. Mooney III, Adviser Professor George F. Riley Professor Sung Kyu Lim Professor Mary Ann Ingram Professor Santosh Pande Date Approved: 11/05/2003 “In his heart a man plans his course, but the LORD determines his steps....” – Proverbs 16:9 To my parents iii ACKNOWLEDGMENTS During my Ph. D. study, there are many people in Georgia Tech to whom I am thankful. Firstofall, Iwouldlike toexpress enormousappreciationtomyadviser, Dr. Vincent J. Mooney III, from the bottom of the heart. In addition to his enthusiasm and professionalism dedicated to allmembers of our Codesign group, Dr. Mooney has been supporting and encouraging me to develop my thesis. With our weekly regular meeting, he has been listening to my idea patiently, and we have been brainstorming by short question and answer session. He has been also helping me improve my writing with logical reasoning and has been correcting my English pronunciation. His technical acumen, integrity and concern for all members of Codesign are remarkable and exemplary. Second of all, I am also grateful to all my committee members, Dr. George F. Riley, Dr. Sung Kyu Lim, Dr. Mary Ann Ingram and Dr. Santosh Pande. Especially, Dr. Riley is the coauthor of two of my papers and has been supportive to enhance our paper quality by developing a switch arbiter simulator. Also, I have to express thank to all members of the Codesign group. Mohamed Shalan, thecoauthor ofone ofmy papers, has beenhelpful me withproductive discus- sion of our research interest and the provision of constructive advice for simulations. Pramote Kuacharoen, Tankut Akgul and his wife Bilge Saglam Akgul have been en- couraging me with helpful advice and hilarious joke and have been broadened my research interest. Jun Cheol Park and Kyeong-Keol Ryu have been supportive for every aspect besides the power estimation and debugging my design. Finally, I wish to thank my parents who gave me a birth and have been supportive for my entire life. Without their encourage and support, I doubt that I have could iv bring my thesis into final form. They have believed in my ability to complete my thesis and have made me confident in pursuing my goal. I also thank to my brother, Eung Seok Shin and my sister, You Yong Shin, for their encouragement. v TABLE OF CONTENTS DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF ABBREBIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . xii SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 II TERMINOLOGY FOR AN ARBITER . . . . . . . . . . . . . . . 6 III RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1 Arbiters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Arbitration for Network Packet Switching: PPA, PPE and others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Logic Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.3 Token Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 On-chip Communication . . . . . . . . . . . . . . . . . . . . . . . . 22 IV ROUND-ROBIN ARBITER DESIGN . . . . . . . . . . . . . . . . 26 4.1 2x2, 3x3 and 4x4 Bus Arbiter Design . . . . . . . . . . . . . . . . . 26 4.2 Switch Arbiter Design . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Hierarchical Bus Arbiter Design . . . . . . . . . . . . . . . . . . . . 46 4.4 Impact on Logic Synthesis: Priority Logic Specification . . . . . . . 48 4.4.1 Hierarchical SA versus PPE . . . . . . . . . . . . . . . . . . 49 4.4.2 Hierarchical SA versus PPA . . . . . . . . . . . . . . . . . . 51 vi 4.5 Fairness in Arbitration . . . . . . . . . . . . . . . . . . . . . . . . . 53 V RAG: ROUND-ROBIN ARBITER GENERATOR . . . . . . . . 58 VI X-GT: CROSSBAR SWITCH GENERATOR FOR MULTIPRO- CESSOR SOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.1 The Xbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.3 Integration with DMMU and its generation tool . . . . . . . . . . . 72 6.3.1 Target Architecture . . . . . . . . . . . . . . . . . . . . . . . 73 6.3.2 Tool Integration . . . . . . . . . . . . . . . . . . . . . . . . . 75 VII EXPERIMENTAL RESULTS . . . . . . . . . . . . . . . . . . . . . 79 7.1 Arbiter Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.1.1 BA Area and Delay Considerations . . . . . . . . . . . . . . 80 7.1.2 SA Area and Delay Comparisons . . . . . . . . . . . . . . . . 80 7.1.3 Power Dissipation of the Arbiters . . . . . . . . . . . . . . . 84 7.1.4 Speedup for a Chip Implementing a 32x32 Network Switch . 92 7.1.5 Fairness Simulation for Hierarchical SAs . . . . . . . . . . . . 100 7.2 Xbar Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.2.1 Xbar Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.2.2 Xbar Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2.3 MP-SoC with Xbar and DMMU . . . . . . . . . . . . . . . . 109 VIII CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 vii LIST OF TABLES Table 1 Truth table of a 4x4 priority logic block . . . . . . . . . . . . . . . 29 Table 2 Simulation results, continuous requests . . . . . . . . . . . . . . . . 101 Table 3 Simulation results, bursty on-off traffic . . . . . . . . . . . . . . . . 101 Table 4 Simulation results, TCP traffic using the GTNets log file . . . . . . 103 viii LIST OF FIGURES Figure 1 Internal structure of (32x32)x32 crossbar switch fabric and thirty- two 32x32 SAs of 32x32 network switch . . . . . . . . . . . . . . . 6 Figure 2 HOL blocking example: without VOQs and with VOQs . . . . . . 8 Figure 3 32x32 network switch architecture . . . . . . . . . . . . . . . . . . 9 Figure 4 (3x2)x2 crossbar switch fabric and two 3x3 SAs (Note: reset signals for the SAs not shown) . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 5 (2x3)x3 crossbar switch fabricandthree2x2SAs(Note: reset signals for the SAs not shown) . . . . . . . . . . . . . . . . . . . . . . . . . 12 Figure 6 2x2 ack-req SA block . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Figure 7 2x2 root SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Figure 8 32x32 network switch architecture (Note: Figure 8 is exactly same as Figure 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Figure 9 32x32 Switch Arbiter (SA) . . . . . . . . . . . . . . . . . . . . . . . 17 Figure 10 AR2: 2-input PPA and its internal logic . . . . . . . . . . . . . . . 20 Figure 11 A binary tree structured PPA . . . . . . . . . . . . . . . . . . . . . 21 Figure 12 Octagon bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 13 A processing tile with crossbar interconnect . . . . . . . . . . . . . 24 Figure 14 Block diagram and logic diagram of a 4x4 Bus Arbiter . . . . . . . 27 Figure 15 4x4 Bus Arbiter (BA) architecture . . . . . . . . . . . . . . . . . . 28 Figure 16 Four processors with a shared memory system (Note: bus and 4x4 BA details shown only as needed for Example 4.1.) . . . . . . . . . 31 Figure 17 Ack-req SA blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Figure 18 Root SA blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Figure 19 Detailed view of a 4x4 root Switch Arbiter . . . . . . . . . . . . . . 34 Figure 20 A 7x7 SA configuration (Note: reset signal not shown) . . . . . . . 35 Figure 21 A 7x7 SA with a different placement of the AND gates (Note: reset signal not shown) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Figure 22 A 4x4 SA implemented by three 2x2 switch arbiter blocks (Note: reset signal not shown) . . . . . . . . . . . . . . . . . . . . . . . . . 36 ix Figure 23 Hierarchical Switch Arbiter for 32 x 32 switch (Note: reset signal not shown) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Figure 24 The critical path of Figure 23 . . . . . . . . . . . . . . . . . . . . . 39 Figure 25 Switch Arbiter algorithm . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 26 Flowchart of Switch Arbiter algorithm . . . . . . . . . . . . . . . . 44 Figure 27 8x8 hierarchical BA . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Figure 28 4x4 ack-req BA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Figure 29 4x4 PPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 30 AR2 block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Figure 31 4x4 Bus Arbiter (BA) architecture (Note: Figure 31 is exactly same as Figure 15) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Figure 32 The pointer update scheme of PEE . . . . . . . . . . . . . . . . . . 55 Figure 33 Flow of RAG tool . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Figure 34 Flowchart of Algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . 62 Figure 35 The example of 4x4 Xbar with four processors and four memory blocks each with a single port . . . . . . . . . . . . . . . . . . . . . 66 Figure 36 Internal structure of a 4x1 switch . . . . . . . . . . . . . . . . . . . 68 Figure 37 Linked-list data structure for Example 6.2 . . . . . . . . . . . . . . 70 Figure 38 The SoC target architecture . . . . . . . . . . . . . . . . . . . . . . 74 Figure 39 The target architecture of four processors and four memory blocks each with a single port . . . . . . . . . . . . . . . . . . . . . . . . . 75 Figure 40 The SoC configuration tool flow . . . . . . . . . . . . . . . . . . . . 76 Figure 41 Flowchart of DX-Gt . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Figure 42 MxM Bus Arbiter area . . . . . . . . . . . . . . . . . . . . . . . . . 81 Figure 43 MxM Bus Arbiter delay . . . . . . . . . . . . . . . . . . . . . . . . 81 Figure 44 MxM Switch Arbiter area . . . . . . . . . . . . . . . . . . . . . . . 83 Figure 45 MxM Switch Arbiter longest delay . . . . . . . . . . . . . . . . . . 83 Figure 46 Methodology of power estimation . . . . . . . . . . . . . . . . . . . 85 Figure 47 MxM hierarchical BA and BA static power dissipation . . . . . . . 86 Figure 48 MxM hierarchical BA and BA dynamic power dissipation . . . . . 87 x
Description: