ebook img

Architecture of High Performance Computers Volume II: Array processors and multiprocessor systems PDF

216 Pages·1989·7.1 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Architecture of High Performance Computers Volume II: Array processors and multiprocessor systems

Architecture of High Performance Computers Volume II R. N. Ibbett and N. P. Topham Department of Computer Science University of Edinburgh Edinburgh Scotland ER9 3JZ Architecture of High Performance Computers Volume II i\rray processors and multiprocessor systems Springer Science+Business Media, LLC © Roland N. Ibbett and Nigel P. Topham 1989 Original1y published by Springer Verlag New York in 1989. All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission. First published 1989 Published by MACMILLAN EDUCATION LT D Houndmills, Basingstoke, Hampshire RG21 2XS and London Companies and representatives throughout the world ISBN 978-1-4899-6703-9 ISBN 978-1-4899-6701-5 (eBook) DOI 10.1007/978-1-4899-6701-5 Contents Preface viii 1 Introduction 1 1.1 Parallel hardware structures . 2 1.2 Taxonomy of parallel architectures 3 1.3 Summary of the book .... 4 2 Array-processor Architecture 6 2.1 Design Issues . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Array processor organisation .. . . . . . . . 8 2.1.2 ILLIAC IV - a distributed-memory machine 11 2.1.3 BSP - a shared-memory machine 12 2.2 Performance issues 15 2.2.1 Scalability. 20 2.3 Summary ..... 21 3 Interconnection Networks 22 3.1 Characteristics of interconnection structures . 23 3.2 Network routing functions 24 3.3 Network topology ..... 29 3.3.1 Static networks .. 31 3.3.2 Dynamic networks 35 3.3.3 Multi-stage networks . 36 3.4 Summary .......... . 42 4 Practical Array Architectures 43 4.1 The ICL DAP ....... . 43 4.1.1 System architecture 44 4.1.2 Array architecture 45 4.1.3 PE architecture . 48 4.1.4 Instruction set 52 4.1.5 Performance .. 56 4.1.6 The DAP-3 ... 58 4.2 The Connection Machine 58 4.2.1 System architecture 59 4.2.2 Processing elements 61 vi Contents 4.2.3 The router 63 4.3 Summary .. . . . 66 5 Array Processor Software 67 5.1 Array processing languages 67 5.1.1 DAP Fortran ... . 68 5.1.2 CM-Lisp ...... . 70 5.2 Algorithms for array processors 73 5.2.1 Partial differential equations 75 5.2.2 Minimum path length 79 5.3 Summary ......... . 82 6 Multiprocessor Architecture 83 6.1 Design issues . . . . . . . . . . . . . . . 86 6.1.1 Categories of MIMD architecture 88 6.1.2 Granularity .. 89 6.1.3 Load balancing ..... 90 6.2 Performance issues . . . . . . . 91 6.2.1 Speed-up and efficiency 96 6.2.2 Extensibility ..... . .104 6.2.3 Reliability and fault-tolerance . .105 6.3 Summary . . . . . . . . . . . . . .108 7 Shared-memory Multiprocessors 109 7.1 Shared-memory architecture ........... . .109 7.1.1 Sequential-access shared-memory systems · 111 7.1.2 Highly-connected shared-memory systems .115 7.1.3 Scalable multiprocessors .116 7.2 The Sequent Balance 8000 . .116 7.2.1 Cache consistency .. . .118 7.2.2 The SLIC ....... . .118 7.2.3 The SB8000 system bus .121 7.3 C.mmp ............ . .122 7.3.1 The small address problem .124 7.3.2 Locks and synchronisation. · 125 7.4 The BBN Butterfly ........ . · 127 7.4.1 Overview of the Butterfly . · 127 7.4.2 Butterfly processing nodes . · 128 7.4.3 The Butterfly switch .130 7.4.4 Performance .132 7.5 Summary . . . . . . . . . . .140 Contents vii 8 Message-passmg Multiprocessors 141 8.1 Design issues for message-passing architectures .143 8.2 Transputer-based systems ......... . .146 8.2.1 Architecture of the T414 ........ . .147 8.2.2 The T800 floating point transputer .. . .156 8.2.3 Constructing multi-transputer systems . .158 8.2.4 The Meiko Computing Surface . .160 8.3 Hypercube multiprocessors ...... . .165 8.3.1 Cosmic Cube and the Intel iPSC .165 8.3.2 The NCUBE/I0 . .166 8.3.3 The FPS T series . .167 8.4 Summary ....... . .167 9 Multiprocessor Software 169 9.1 Languages for multiprocessors . .169 9.1.1 Ada ....... . .170 9.1.2 Occam.......... . 175 9.2 Multiprocessor algorithms . . . . 177 9.2.1 Sorting on a shared-memory architecture . 178 9.2.2 Matrix multiplication using message-passing . . 185 9.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 191 Bibliography 194 Index 201 Preface This book is the second volume of a two-volume set covering the architec ture of high performance computers. The division of material between the two volumes has been devised so that Volume I essentially deals with architectures in which parallelism is used to attain high performance but is hidden from the programmer , whereas Volume 11 deals with machines which are explicitly parallel in nature. Volume I therefore describes archi tectural techniques that can be used, and indeed have become widespread, in the design of individual high performance processors, whereas this volume concentrates on the architecture of systems in which a number of proces sors operate in concert to achieve high performance. The high performance structures described in Volume I are naturally applicable to the design of the elements within parallel processors. Volume 11 represents a historical pro gression from Volume I, describing some architectures and machines which have evolved recently and could be described as 'state-of-the-art'. Computer architecture is an extensive subject, with a large body of mostly descriptive literature, and any treatment of the subject is necessar ily incomplete. There are many high performance architectures, both on the market and within research environments, far too many to cover in a student text. We have attempted to extract the fundamental principles of high performance architectures and set them in perspective with case studies. Where possible we have used commercially available machines as our examples. The two volumes of this book are designed to accompany undergraduate courses in computer architecture, and constitute a core of material presented in third and fourth year courses in the Computer Science Department at Edinburgh University. The authors would like to thank Duncan Roweth for vetting the section which describes the Meiko Computing Surface, as weIl as the colleagues and friends who read and commented on other parts of the manuscript. Roland Ibbett Nigel Topharn Vlll 1 Introduction In volume I of this two-volume set we examined the architectural tech niques that have been used to produce high performance computers. This included techniques to maximise processor performancej for example, in struction pipelines and parallel functional units. It also included techniques to maximise the throughput, and minimise the latency, of storage struc tures; for example, interleaving and caching respectively. We saw how these design techniques can be brought together in the form of vector processors in order to provide a platform for very high performance numerical process ing. However, all the machines considered in volume I have something in commonj they operate within a relatively conventional programming model, and this means that high-Ievellanguage programs written for one high per formance architecture will work equally weIl on another, with little or no modification. In this book we are concerned with architectures for which this does not necessarily hold true, and for which new languages and new application algorithms are required. This naturally implies a greater overall design effort, but in many cases this is outweighed by the resulting gain in performance. The architectures dealt with by this book all embody some form of parallel processing capability that cannot be hidden from the user's view of the machine, at least not without the aid of compilers that are able to decompose a conventional program into fragments of parallel code automatically. One question which must be answered is 'why do we need to consider new architectures when existing architectures have served so weIl in the past?'. There are in fact several good reasons why we should consider new architectures, and as always they stern from changes in the cost and perfor mance characteristics of modern technology. Perhaps most importantly, the cost of replicating a piece of logic, as opposed to making it work faster, has fallen dramatically. This is due to advances in micro-fabrication technology. Thus it has become cheaper to build a system using a hundred micropro cessors than to build a single-processor system that is one hundred times more powerful than a single microprocessor. This follows from two related facts: firstly, the cost of each transistor on a silicon die has fallen continu ously since integrated circuits were developed, and secondly, the number of transistors that can be squeezed onto a single silicon die has also increased. This has now reached the point where complete processors, using many of the techniques found in volume I, can be fabricated as a single device. For 1 2 Architecture of High Performance Computers - Volume II Exploiting Spatial Parallelism Figure 1.1 A spatially-parallel structure example, the Motorola,M88100 microprocessor supports a number of par allel functional units and has a Score board to deal with data dependencies as in the CDC 6600 and 7600 mach in es. It is anticipated that as a result of the availability of high performance single-chip processors, and the ever-increasing demand for more powerful computing systems, the market for parallel processors will increase dramat ically during the late 1980s. For example, one market prediction [JD86] states that the value of sales of parallel processors in the UK alone is likely to increase by 500% in the period 1988-89. The supercomputer market, traditionally the principal beneficiary of research into high performance computer systems, is expected to be eclipsed by the expanding market for parallel workstations and parallel symbolic processors as small-scale parallel systems become more widely available. 1.1 Parallel hardware structures All computing systems are constructed from interconnected components, and depending on the level of abstraction at which a system is viewed, these components could be transistors, gates, registers, arithmetic units, memories, or even complete processors. At all levels of abstraction there are two fundamental ways in wh ich components can be composed to create parallel computing structures. Perhaps the simplest way to introduce parallelism into a computing structure is to replicate a component n times, as shown in figure 1.1. To exploit this form of parallelism, the units of information processed by the original (non-parallel) component must be partitionable. In other words the task space must be parallelised. For this reason this form of parallel-

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.