Table Of ContentMULT ITHREADED
COMPUTER
ARCHITECTURE:
A SUMMARY OF THE
STATE OF THE ART
THE KLUWER INTERNATIONAL SERIES
IN ENGINEERING AND COMPUTER SCIENCE
MULT ITHREADED
COMPUTER
ARCH ITECTU RE:
A SUMMARY OF THE
STATE OF THE ART
EDITED BY
Robert A. Iannucci
Exa Corporation
Cambridge, Massachusetts, USA
•
Guang R. Gao
McGill University
Montreal, Quebec, Canada
•
Robert H. Halstead, Jr.
Digital Equipment Corporation
Cambridge, Massachusetts, USA
•
Burton Smith
Tera Computer Company
Seattle, Washington, USA
KLUWER ACADEMIC PUBLISHERS
Boston/London/Dordrecht
Distributors for North America:
Kluwer Academic Publishers
101 Philip Drive
Assinippi Park
Norwell, Massachusetts 02061 USA
Distributors for all other countries:
Kluwer Academic Publishers Group
Distribution Centre
Post Office Box 322
3300 AH Dordrecht, THE NETHERLANDS
Library of Congress Cataloging-in-Publication Data
Multithreaded computer architecture: a summary of the state of the
art / edited by Robert A. Iannucci ... [et al.].
p. cm. -- (Kluwer international series in engineering and
computer science ; 0281)
Includes bibliographical references and index.
ISBN 0-7923-9477-1 (alk. paper)
1. Computer architecture. I. Iannucci, Robert A., 1955-
II. Series: Kluwer international series in engineering and computer
science ; SECS 0281.
QA76.9.A73M85 1994
004' . 32--dc20 94-21605
CIP
Copyright © 1994 by Kluwer Academic Publishers
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system or transmitted in any form or by any means, mechanical,
photo-copying, recording, or otherwise, without the prior written permission of
the publisher, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park,
Norwell, Massachusetts 02061
Printed on acidjree paper.
CONTENTS
PREFACE
xiii
PART I: BACKGROUND AND ISSUES
1 MULTITHREADED ARCHITECTURES:
PRINCIPLES, PROJECTS, AND ISSUES
Jack B. Dennis and Guang R. Gao 1
1 Introduction 1
2 Microprocessor Evolution: Principles and Challenges 3
3 Multithreaded Program Execution Models 15
4 HEP: The Heterogeneous Element Processor System 18
5 A Dataflow Architecture 24
6 Monsoon 30
7 Other Multithreaded Architecture Projects 36
8 Issues in Multithreaded Architecture 42
9 Conclusions 58
REFERENCES 60
2 ARCHITECTURAL AND IMPLEMENTATION
ISSUES FOR MULTITHREADING (PANEL
DISCUSSION)
Robert A. Iannucci 73
1 Introduction 73
2 Summary of the Discussion 74
3 Conclusion 77
vi MULTITHREADED COMPUTER ARCHITECTURE
3 ISSUES IN THE DESIGN AND
IMPLEMENTATION OF INSTRUCTION
PROCESSORS FOR MULTICOMPUTERS
(POSITION STATEMENT)
William J. Dally 79
1 Multicomputers and Multithreading 79
2 Implementation Issues 80
3 Pointers to Related Papers 82
REFERENCES 82
4 PROGRAMMING, COMPILATION, AND
RESOURCE-MANAGEMENT ISSUES FOR
MULTITHREADING (PANEL DISCUSSION)
Robert H. Halstead, Jr. 83
1 Introduction 83
2 Summary of the Discussion 84
3 Conclusion 88
5 PROGRAMMING, COMPILATION AND
RESOURCE MANAGEMENT ISSUES FOR
MULTITHREADING (POSITION STATEMENT)
Rishiyur S. Nikhil 89
1 Multithreaded Architectures Enable General Purpose Parallel
Programming 89
2 Multithreaded Architectures Will Run Multicomputer Soft-
ware Well (or Better) 90
3 Sources of Parallelism 91
4 Compilation and Resource Management 92
5 Conclusion 94
REFERENCES 94
6 MULTITHREADING: FUNDAMENTAL LIMITS,
POTENTIAL GAINS, AND ALTERNATIVES
David E. Culler 97
1 Introduction 97
2 Multithreading to Tolerate Latency 100
Contents
Vll
3 Network Limits on Multithreading 108
4 Active Messages and Split-C 114
5 Multithreading to Support Dynamic Parallelism 123
6 Summary 133
REFERENCES 135
PART II: KEY ELEMENTS
7 LOW-COST SUPPORT .FOR FINE-GRAIN
SYNCHRONIZATION IN MULTIPROCESSORS
David Kranz, Beng-Hong Lim, Anant Agarwal and Donald Yeung 139
1 Introduction 140
2 A Low-Cost Approach to Fine-Grain Synchronization 142
3 Programming Language Issues 144
4 Alewife Implementation 146
5 Performance Results 154
6 Related Work 161
7 Conclusions 163
REFERENCES 164
8 ARCHITECTURAL AND IMPLEMENTATION
TRADEOFFS IN THE DESIGN OF MULTIPLE
CONTEXT PROCESSORS
James Laudon, Anoop Gupta and Mark Horowitz 167
1 Introduction 167
2 Interleaved Multiple-Context Processor Proposal 170
3 Evaluation Methodology 174
4 Performance Results 181
5 Implementation Issues 189
6 Conclusions 196
REFERENCES 197
Vlll MULTITHREADED COMPUTER ARCHITECTURE
9 NAMED STATE AND EFFICIENT CONTEXT
SWITCHING
Peter R. Nuth and William J. Dally 201
1 Introduction 201
2 Multithreaded Processors 203
3 The Named-State Register File 204
4 Register Utilization 207
5 Implementation 208
6 Performance 209
7 Conclusion 211
REFERENCES 211
10 IDEAS FOR THE DESIGN OF
MULTITHREADED PIPELINES
Amos R. Omondi 213
1 Introduction 213
2 Architecture 214
3 Implementation 225
4 Conclusion 248
REFERENCES 249
PART III: SYSTEMS
11 INTEGRATED SUPPORT FOR
HETEROGENEOUS PARALLELISM
Gail Alverson, Bob Alverson, David Callahan, Brian Koblenz, Allan
Porterfield, and Burton Smith 253
1 Introduction 253
2 Related Systems 255
3 Overview of the Tera Architecture 262
4 Very Fine-grained Parallelism 263
5 Fine-grained Parallelism 265
6 Medium-Grained Parallelism 271
7 Coarse-Grain Parallelism 276
8 Summary 279
Contents ix
REFERENCES 280
12 AN ARCHITECTURE FOR GENERALIZED
SYNCHRONIZATION AND FAST SWITCHING
Kattamuri Ekanadham, Steve Gregor, Kei Hiraki, Robert A. Iannucci
and Ragunathan Rajkumar 285
1 Introduction 285
2 Architectural Highlights 287
3 System Description 289
4 Compilation and Resonrce Management 296
5 Hardware Design 299
6 Support for Real-Time Systems 307
7 Conclusions 313
REFERENCES 315
13 CONCURRENT EXECUTION OF
HETEROGENEOUS THREADS IN THE
SUPER-ACTOR MACHINE
Herbert H.J. Hum, Guang R. Gao 317
1 Introduction 317
2 The Super-Actor Execution Model 321
3 The Architecture of the Super-Actor Machine 333
4 SAXPY Revisited 342
5 Conclusion 347
REFERENCES 348
PART IV: ANALYSIS
14 ANALYSIS OF MULTITHREADED
MICROPROCESSORS UNDER
MULTIPROGRAMMING
David E. Culler, Michial Gunter, James C. Lee 351
1 Introduction 352
2 Analytical Model 353
3 Method of Analysis 355
4 Multithreaded Cache Behavior 357
x MULTITHREADED COMPUTER ARCHITECTURE
5 Unfairness of Switch-on-miss Multithreading 363
6 Processor Utilization 365
7 Conclusions 367
REFERENCES 370
15 EXPLOITING LOCALITY IN HYBRID
DATAFLOW PROGRAMS
Walid A. Najjar, A. P. Wim Bohm, W. Marcus Miller 373
1 Introduction 373
2 Nature and Impact of Locality 374
3 Thread Locality in Dataflow Execution 376
4 Conclusion 385
REFERENCES 385
INDEX 389