ebook img

The Microarchitecture of Pipelined and Superscalar Computers PDF

273 Pages·1999·7.469 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Microarchitecture of Pipelined and Superscalar Computers

THE MICROARCHITECTURE OF PIPELINED AND SUPERSCALAR COMPUTERS THE MICROARCHITECTURE OF PIPELINED AND SUPERSCALAR COMPUTERS by AMOS R. OMONDI Department of Computer Science, Flinders University, Adelaide, Australia Springer-Science+Business Media, B.V. A C.i.P. Catalogue record for this book is available from the Library of Congress. ISBN 978-1-4419-5081-9 ISBN 978-1-4757-2989-4 (eBook) DOI 10.1007/978-1-4757-2989-4 Published by Kluwer Academic Publishers, P.O. Box 17,3300 AA Dordrecht, The Netherlands. Sold and distributed in North, Central and South America by Kluwer Academic Publishers, 101 Philip Drive, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Kluwer Academic Publishers, P.O. Box 322, 3300 AH Dordrecht, The Netherlands. Printed on acid-free paper All Rights Reserved © 1999 Springer Science+Business Media Dordrecht Originally published by K1uwer Academic Publishers Boston in 1999. Softcover reprint of the hardcover 1st edition 1999 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the copyright owner To my anchor: Anne Hayes and my mother: Mary Omondi Contents Preface ix Acknowledgments XI 1. FUNDAMENTALS OF PIPELINING 1 1.1 I ntrod uction 1 1.2 A taxonomy of pipelines 3 1.3 Ideal performance of a pipeline 6 1.4 Impediments to ideal performance 8 1.5 Case studies 10 1.6 Summary 30 2. TIMING AND CONTROL OF PIPELINES 31 2.1 Clock-cycle bounds 31 2.2 Clock-signal distribution 34 2.3 Latch design 39 2.4 Structural hazards and sequencing control 43 2.5 Summary 45 3. HIGH-PERFORMANCE MEMORY SYSTEMS 47 3.1 Memory interleaving 48 3.1.1 Basic principles 49 3.1.2 Addressing patterns 51 3.1.3 Case studies 53 3.2 Caches 58 3.2.1 Placement policies 59 3.2.2 Replacement policies 63 3.2.3 Fetch policies 64 3.2.4 Write policies 65 3.2.5 Performance 66 3.2.6 Case studies 73 3.3 Summary 81 4. CONTROL FLOW: BRANCHING AND CONTROL HAZARDS 83 vii viii THE MICROARCHITECTURE OF PIPELINED AND SUPERSCALAR COMPUTERS 4.1 Pipeline length and location of control point 86 4.2 Latency reduction by instruction buffering 88 4.2.1 Types of instruction buffer 89 4.2.2 Case Studies 90 4.2.3 Comparison of instruction buffers 96 4.2.4 Summary 97 4.3 Static instruction-scheduling 98 4.4 Branch prediction 101 4.4.1 Static and semi-static prediction 102 4.4.2 Dynamic prediction 104 4.4.3 Case studies 128 4.4.4 Summary 143 4.5 Predicated execution 144 4.6 Other solutions of the branching problem 146 4.7 Effect of instruction-set architecture 148 4.8 Summary 150 5. DATA FLOW: DETECTING AND RESOLVING DATA HAZARDS 151 5.1 Types of data hazards 152 5.2 Implementing renaming 157 5.3 Fast resolution of true hazards 163 5.4 Case studies 167 5.5 Summary 183 6. VECTOR PIPELINES 185 6.1 Fundamentals 186 6.2 Storage and addressing of vectors 191 6.3 Instruction sets and formats 197 6.4 Programming techniques 198 6.5 Performance 203 6.6 Case studies 208 6.7 Summary 218 7. INTERRUPTS AND BRANCH MISPREDICTIONS 219 7.1 Implementation techniques for precise interrupts 221 7.1.1 In-order completion 221 7.1.2 Reorder-buffer 223 7.1.3 History-buffer 224 7.1.4 Future-file 226 7.1.5 Checkpoint-repair 228 7.1.6 Register update unit 228 7.1.7 Other aspects of machine-state 229 7.1.8 Summary 231 7.2 Case Studies 233 7.3 Summary 242 Bibliography 243 Preface This book is intended to serve as a textbook for a second course in the im plementation (Le. microarchitecture) of computer architectures. The subject matter covered is the collection of techniques that are used to achieve the highest performance in single-processor machines; these techniques center the exploitation of low-level parallelism (temporal and spatial) in the processing of machine instructions. The target audience consists students in the final year of an undergraduate program or in the first year of a postgraduate program in computer science, computer engineering, or electrical engineering; professional computer designers will also also find the book useful as an introduction to the topics covered. Typically, the author has used the material presented here as the basis of a full-semester undergraduate course or a half-semester post graduate course, with the other half of the latter devoted to multiple-processor machines. The background assumed of the reader is a good first course in computer architecture and implementation - to the level in, say, Computer Organization and Design, by D. Patterson and H. Hennessy - and familiarity with digital-logic design. The book consists of eight chapters: The first chapter is an introduction to all of the main ideas that the following chapters cover in detail: the topics covered are the main forms of pipelining used in high-performance uniprocessors, a taxonomy of the space of pipelined processors, and performance issues. It is also intended that this chapter should be readable as a brief "stand-alone" survey. The second chapter consists of a brief discussion of issues in timing and control: the topics covered are bounds on the processor clock-cycle, the implementation of clocking systems, the design of specialized latches for pipelining, and how to detect and deal with potential conflicts in resource usage. The third chapter deals with the implementation of high-performance memory systems and should largely be a review for well-prepared readers. The fourth, and longest, chapter, deals with the problem of ensuring that the processor is always adequately supplied with instructions; this is probably the most difficult problem in the design a pipelined machine. The fifth chapter deals with a similar, but more tractable, problem in the flow of data. The sixth chapter is an introduction to ix x THE MICROARCHITECTURE OF PIPELINED AND SUPERSCALAR COMPUTERS the design of pipelines to process vector data. And the last chapter covers mechanisms used to facilitate precise interruption. Each chapter is divided into two main parts: one part covers the general principles and ideas, and the other part gives case studies taken from practical machines. A few remarks should be made regarding these case studies: First, they have all been selected to demonstrate particular points, although in a few cases there are similarities between the different machines covered; the reader therefore need not review at once all the case studies in anyone chapter but may take different ones on different readings. Second, a requirement in their selection has been that they be adequately documented in the published literature; this has necessarily meant the exclusion of machines that would otherwise be of interest. Third, the reader should bear in mind that the latest machines do not necessarily employ fundamentally new techniques, nor is it the case that "old" machines necessarily employ "out-of-date" techniques: many of the sophisticated ideas in the implementation of current high-performance machines appeared twenty to thirty years ago - in machines such as the CDC 6600 and the IBM 360/91. Lastly, the amount of detail given on the various machines has been determined by what is available in the published literature, and this varies greatly; this has sometimes resulted in an uneven coverage, which should not be taken as an indicator of relative importance. Acknowledgments Gratitude to The Most Merciful, The Compassionate and Compassionating One, without whom no good thing is possible. xi 1 FUNDAMENTALS OF PIPELINING We shall begin by introducing the main issues in the design and implementation of pipelined and superscalar computers, in which the exploitation of low-level parallelism constitute the main means for high performance. The first section of the chapter consists of a discussion of the basic principles underlying the design of such computers. The second section gives a taxonomy for the clas sification of pipelined machines and introduces a number of commonly used terms. The third and fourth section deal with the performance of pipelines: ideal performance and impediments to achieving this are examined. The fifth section consists of some examples of practical pipelines; these pipelines form the basis for detailed case studies in subsequent chapters. The last section is a summary. 1.1 INTRODUCTION The basic technique used to obtain high performance in the design of pipelined machines is the same one used to obtain high productivity in factory assembly lines. In the latter situation, the work involved in the production of some object is partitioned among a group of workers arranged in a linear order such that each worker performs a particular task in the production process before passing the partially completed product down to the next worker in the line. All workers operate concurrently, and a completed product is available at the end of the line. This arrangement is an example of temporal parallelism - that 1 A. R. Omondi, The Microarchitecture of Pipelined and Superscalar Computers © Springer Science+Business Media New York 1999

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.