ebook img

Parallel Programming in C with MPI and OpenMP PDF

258 Pages·2003·16.886 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parallel Programming in C with MPI and OpenMP

ANAL PERFORMANCE 'VSIS FORMULAS Amdahl's Law Let f be the fraction of operations in a computation that must be performed sequentially, where 0 f < I. The maximum speedup l{! achievable by aparallel Parallel Programming computer with p processors performing the computation is in C with MPI "'::;f+O­ andOpenMP Gustafson-Barsis's Law Given a parallel program solving a problem of size Il using p processors, let s denote the fraction of total execution time spent in serial code. The maximum Michael J. Quinn speedup 1/J achievable by this program is State n.'. ____ ..'L l{!'::;/J+(l-p)s Karp-Flatt Metric C,IUUlllllll/,; speedup if!on pprocessors, where p > I, the experimentally determined serial fraction e is defined to be IN-lip e --.....­ I-lip Isoefficiency Relation a parallel system exhibits efficiency size and p denotes number of processors. Define C = f(n, p)/O T(n, I) denote sequential execution lime, and let 7;,(11, p) denote parallel over­ head (total amount of time spent by all processors performing communications II and redundant computations). In order to maintain the same level of efficiency as the number of processors increases, problem size must be increased so that the Higher Education" fa \lowing inequality is satisfied: )~ p) Boston Burr Ridqe. Il Dubuque, IA Madison, WI New York San Francisco SI louis Caracas Kuala lumpur Lisbon London Madrid Mexico Milan Montreal New Delhi Santiaao Seoul Sinaaoore Svdnev Taioei Toronto The McGr(lwHifI(Ompunie$f';~~" II W'dh gratitude for their love, support, and guidance, I dedicate this book to mv parents, Edward and Georgia Quinn. PARALLEL PROGRAMMING IN C WITH MP! AND OPENMP International Edition 2003 Exclusive rights by McGraw,HiII Education (Asia), for manufacture and export. This book cannot be re-exported from the country to which it is sold by McGraw-HlU. The International Edition is not available in North America. Published by McGraw-Hili, a business unit of The McGraw·HiIl Companies, Inc., 1221 Avenue of the Americas, New Yarl(, NY 10020. Copyright ©2004 by The McGraw·Hill Companies, Tnc. AU rights reserved. No part of this publication may be reproduced or distributed in any forn or by any means, or stored in adatabase or retrieval system, without tlie prior written consent of The MeGraw,Hill Companies, Inc., including, but not limited to, in any network or other electronic storage or transmission, or broadcast for distance learning. Some ancillaries, il1tluding electronic and print comjlonents, may not be available to customers outside the United States. 10 09 08 07 06 20 09 08 07 CTF SLP Library of Congress Cataloging·in-Publication Data Quinn, Michael J. (Michael Jay) Parallel programming in C with MPI and OpenMP I Michael J. Quinl1.-lst ed. p. em. ISBN 007-282256-2 I C (ComIJuter program language). 2. Parallel progr.lll1ming (Computer ,cience). I.Tltle. 2004 oo5.13'3-{]c21 2003046371 CIP When ordering this title, use ISBN 007-123265-6 Printed in Singapore www.mhhe.com /\nt., ~'" ViWO) BRIEF TABLE OF CONTENTS~~"'k&"";;~' CONTENTS Preface xiv Preface xiv CHAPTER 2 Parallel Architectures 27 1 2.1 Introduction 27 CHAPTER 1 Motivation and History I Motivation and tlistory 2.2 Interconnection Networks 28 , 2 Parallel Architectures 27 2.2.1 Shared l.l Introduction 1 Media 28 3 Parallel Design 63 1.2 Modern Scientific Method 3 2.2.1 SWiTCh Network Topologies 29 4 Message-Passing Programming 93 1.3 Evolution of Supercomputing 4 2.23 2-D Mesh Network 29 5 The Sieve of Eratosthenes 115 1.4 Modern Parallel Computers 5 2.2.4 Binary Tree Network 30 6 Floyd's Algorithm 137 1A.l The Cosmic Cube 6 2.2.5 Hypertree Network 31 1.42 Commercial Parallel 2.2.6 BUlferjly Network 32 7 Performance Analysis 159 ComplIlers 6 2.2.7 8 Matrix-Vector Multiplication 178 1.4.3 Beowulf 7 2.2.8 35 9 Document Classification 216 1.4.4 2.2.9 Summary 36 [rzitiaJive 8 2.3 Processor Arrays 37 10 Monte Carlo Methods 239 1.5 9 2.3.1 Architecture 11 Matrix Multiplication 273 1.5.1 Data Dependence 9 Operatwns 37 12 Solving Linear Systems 290 1.5.2 Data Parallelism 10 2.3.2 Processor Array Performaflce 39 13 Finite Difference Methods 318 1.5.3 FUllctional Parallelism 10 2.3.3 Processor IntercolUlfetioll Network 40 2.3.4 Enabling lind DisabUng Process01;\ 40 14 Sorting 338 1.5.4 Pipelini/lg 12 2.3.5 Additional Architectural Feat/ires 42 1.5.5 Size ConsideraJions 13 15 The Fast Fourier Transform 353 2.3.6 42 1.6 Data C!ustellng 14 16 Combinatorial Search 369 2.4 43 1.7 Programming Parallel Computers 17 2.4.1 Centralized Multiprocessors 43 17 Shared-Memory Programming 404 1.7.1 El.1elul a Compiler 17 2.4.2 Distributed Multiprocessors 45 18 Combinim! MPIand OoenMP 436 1.7.2 2.5 49 Appendix A MPI Functions 450 2.5.1 Asymmetrical MulticoJllpurers 49 1.7.3 Add a Parallel Programming Appendix B Utility Functions 485 2.5.2 Symmetrical Muilicoili/mlef's 51 Layer 19 Appendix C Debugging MP! Programs 505 2.5.3 Which Model Is Bestfor a Commodilv 1. 7.4 Create a Parallel Language 19 Cluster? 52 Appendix D Review of Complex Numbers 509 1. 7.5 ClIrrellf Staws 21 2.5.4 DifFerences between Cilisters and ~ Appendix E OpenMP Functions 5!3 1.8 Summary 21 Nelivorks 1~t'WorkstatiolJs 53 1.9 Key Tenus 22 2.6 Flynn's Taxonomy 54 Bibliography 515 1.10 Bibliographic Notes 22 2.6.1 SfSD 54 Author Index 520 1.11 Exercises 23 2.6.2 SfMD 55 Subject Index 522 iv v Contents vii vi Contents 2.6.3 MISD 55 3.9 Key Terms 90 5.45 Ramifications ofBlock CHAPTER 7 Decomposition 121 Performance Analysis 159 2.604 MtMD 56 3.10 Bibliographic Notes 90 5.5 Developing the Parallel Algorithm 121 2.7 Summary 58 3.11 Exerdses 90 7.1 Introduction 159 55•.1 Function MPI_Bcast 122 2.8 Key Terms 59 7.2 Speedup and Efficiency 159 5.6 Analysis of Parallel Sieve Algorithm 122 2.9 Bibliographic Notes 59 7.3 Amdahl's Law 161 5.7 Documenting the Parallel Program 123 2.10 Exercises 60 HIIPTER4 7.3.1 Limitations ofAmdahl's Law 164 Message.Passing Programming 93 5.8 Benchmarking· 128 7.3.2 The Anulahl Effect 164 5.9 Improvements 129 4.1 Introduction 93 7.4 GlIstafson-Barsis's Law 164 3 5.9.1 Delete Even Integers 129 APTER 4.2 The Message-Passing Model 94 7.5 The Karp-Flatt Metric 167 Parallel Algorithm Design 63 5.9.2 Eliminate Broadcast 130 43 The Message-Passing Interrace 95 7.6 The Isoefficiency Metric 170 5;9.3 Reorganize Loops IJl 3.1 Introduction 63 4.4 Circuit Satisfiability 96 5.9.4 BencJwwrldng I31 7.7 Sumillary 174 3.2 The Task/Channel Model 63 4.4.1 FunctiollMPI_Init 99 5.HI Summary 133 7.8 Key Terms 175 3.3 Foster's Methodology 64 4.4.2 Functions MPI_Comm_rankand 7.9 Bibliographic Notes 175 5.11 Key Terms 134 3.3.1 65 MPI_Colnm_size 99 5.12 Bibliographic Notes 134 7.10 Exercises 176 3.3.2 Communication 67 4.4.3 FUllctioll MPI_Finalize 101 5.13 Exrrcises 134 3.3.3 Agglomeration 68 4.4.4 Compiling MPl Programs 102 i 8 3.304 Mapping 70 4045 RUllning MPI Programs 102 CHAPTER 3.4 Boundary Value Problem 73 4.5 Introducing Collective 6 Matrix.Vector Multiplication 178 );HAPTER COllllllunication 104 304.1 Introduction 73 Floyd's Algorithm 137 . 8.1 I1!troduction 4.5.1 Functio/l MPI_Reduce 105 3.4.2 Partitioning 75 8.2 Sequential Algorithm 179 4.6 Benchmarking Parallel Perrormance 108 6.1 Introduction 137 304.3 Communication 75 8.3 Data Decomposition Options 180 3.404 Agglomeration and Mapping 76 4.6.1 FUllctions fliP I _ wtime and 6.2 The All-Pairs Shortest-Path 8.4 Rowwise Block-Striped MPCwtick 108. Problem 137 3.4.5 Analysis 76 Decomposition 181 ./.6.2 FUllction MPI~Barrier 108 6.3 Creating Arrays at Run TIme 139 3.5 Finding the Maximum 77 8.4.1 Design and AIUllysis 181 4.7 Summary 110 6.4 Designing the Parallel Algorithm 140 3.5.1 Introduction 77 804.2 Replicating a Block-Mapped Vector 183 4.8 Key Terms 110 6.4.1 Partitioning 140 3.5.2 Partitioning 77 8.4.3 FUllction MPI_Allga tlwrv 184 4.9 Notes 110 6.4.2 Communication 141 3.5.3 Communication 77 3504 Agglomeration and Mapping 81 4.10 Exercises 6.4.3 AgglomeratkJII and Mapping 142 8.4.4 Replicated 186 6.404 Matrix Input/Outpul 143 8.4.5 Documenting the Parallel Pragmm 187 3..5.5 Analysis 82 6.5 Point-to-Point COllllllllnication 145 8.4.6 187 3.6 The noBody Problem 82 CHAPTERS 6.5.1 FUllctioll MPI_Send 146 8.5 Columnwise 3.6.1 Introduction 82 The Sieve of Eratosthenes 115 6.5.2 FUllctioll MPI_Recv 147 Decomposition 189 3.6.2 Parritioning 83 3.6.3 Communication 83 5.1 Introduction j 15 6.5.3 Deadlock 148 8.5.1 Design (lIUI Analysh 189 3.6.4 Agglomeration alld Mapping 85 5.2 Sequential Algorithm 115 6.6 8.5.2 Reading a Colummvi.e Block-Striped Program Matrix 19/ 3.6.5 Analysis 85 5.3 Sources of Parallelism 117 6.7 Analysis and Benchmarking 151 8.U Function 191 3.7 Adding Data Input 86 5.4 Data DedlIllposition Options 117 6.8 Summary 154 8.5.4 Printing a COiUl1llHVlSe 1IInrk <;tri 3.7.1 Introduction 86 504.1 Interleaved Data Decomposition I i8 Matrix 193 3.7.2 Communication 87 5.4.2 Block Data Decomposition 118 6.9 Key Terms 154 3.7.3 Analysis 88 5.4.3 Block Decomposition Macros 120 6.10 Bibliographic Notes 154 8.5.5 FUllction MPCGat"~erv 193 8.5.6 Distributing Partial Results 195 Summary 89 5.4.4 Local Index versus Glohallndex 120 6.11 Exercises 154 viii Contents Contents ill: 8.5.7 FUllctionMPI_lllltoaUv 195 9.5.1 Assigning Groups ofDOCIWlenls 232 CHi\PTER 11 12.45 Comparison 303 8.5.8 DOCLUllcntillg the Parallel PlVgram 196 9.5.2 Pipelining 232 Matrix MuHiplication 273 12.4.6 Pipe/ined. Row"O, 304 8.5.9 Benchmarking 198 95.3 Functioll MPI_Testsome 234 12.5 Iterati\re Methods 306 8.6 Checkerboard Block Decomposition 199 9.6 Summary 235 11.1 Introductioll 273 12.6 The Conjugate Gradient Method 309 8.6.1 Design and Analysis /99 9.7 Key Terms 236 11.2 Sequential Matrix Multiplication 274 11.6.1 Sequential Algorithl/! 309 8.6.2 Creating a C011lmun/altor 202 9.8 Bibliographic Notes 236 1l.:U ltemtive, Row-Oriented 3/0 274 8.63 FltllctionJlfPI_Dims_create 203 9.9 Exercises 236 12.7 11.2.1 Recur.ril'e, Block-Oriented 8.6.4 FUllctioll MPI_Cart_create 204 Algorithm 275 12.8 Key Terms 314 8.6.5 Reading a Checkerboard Matrit 205 CHAPTER 10 11.3 Rowwise Block-Striped Parallel 12.9 Bibliographic Notes 314­ 8.6.6 FunctionJlfPI_Cart_rank 205 Monte Carlo Methods 239 Algorithm 277 12.10 Exercises 314 8.6.7 Function MPI_Cart__ .coord,s 207 10.1 Introduction 239 11.3.1 IdentifyingPfimitive Tasks 277 8.6.8 Function MPCCo1lllu_split 207 8.6.9 BenchnUlrking 208 10.1.1 Why Monte Carlo Works 240 11.3.2 Agglomeration 278 CHAPTER 13 8.7 Summary 210 10.1.2 Monte Carlo and Parallel [[.3.3 COml1lllllicatiol1 and Further Finite Difference Methods 243 Agglomeration 279 8.S Terms 211 10.2 Sequential Random Number 11.3.4 Analysis 279 ll.l - Introduction 318 8.9 iographic Notes 21 Generators 243 11.4 AlgOlithm 281 13.2 Partial Differential Equations 320 8.10 Exercises 21 10.2.1 Linear COilgruential 244 IlA.1 Agglomeration 281 13:2.1 Categorizing PDEs 320 10.2.2 Lagged Fibonacci 245 11.4.2 Comml/nication 283 13.2.2 Difference Quoti£/JIS 321 9 CHAPT R 10.3 Parallel Random Number Generators 245 11.4.3 Af1(I(y,ris 284 13.3 Vibrating String 322 Document Classification 216 10.3.1 Manager-Worker Method 246 11.5 Summary 286 13.3.1 Deriving EquatiolL, 322 9.1 Introduction 216 10.3.2 Leapfrog Me/hod 246 11.6 Key Terms 287 13.3.2 Deriving the Sequelllia/ Program 323 9.2 Parallel Algorithm Design 217 10.•1.3 Sequence Splitting 247 11.7 Bibliographic Notes 287 13.3.3 Parallel Program Design 324 9.2.1 Partitioning and C011l11Jltnicalion 217 10.3.4 Parameterization 248 11.8 Exercises - 287 133.4 Isoefficiency Analysis 327 9.2.2 Agglomeration and MaPIJing 217 10.4 Other Random Number Distributions 248 1335 Replicating Comptlfotions 327 13.4 Steady-State Heat Distribution 329 9.2.3 ManagerlWorker Paradigm 218 10A.1 Inverse Cumulatil'e Distribution Functioll 12 9.2.4 Manager PlVcess 219 Tr{Ul~for/11atiol1 249 CHAPTER 13.4.1 Deriving Equations 329 10.4.2 Box-MullerTransfimnatiol1 250 Solving Unear Systems 290 13.4.2 Deriving the Sequemiill Program 330 9.25 Function MPI__Abort 220 9.2.6 Worker Process 221 10A.3 The Rejection Method 251 12.1 Introduction 290 13.4.3 Parallel Pragrom Design 332 9.2.7 Creating a Workers-oilly 10.5 Case Studies 253 12.2 Terminology 291 13.4.4 Isoefficiency Analysi, 332 Communicator 223 10.5.1 Neutron Transport 253 12.3 Back Subslimtion 292 13.4.5 Implementation Details 334 9.3 Nonblocking Communications 223 10.5:2 Temperature at a Point Inside 12.11 Seqllelllial Algorithm 292 13.5 Summary 334 9.3.1 Manager's COlrlnumicarion 224 (l 2"D Plate 255 J2.1l Row-Oriented 293 13.6 Key Terms 335 9.3.2 Function MPI_IrecIT 224 105.3 Two.Dimensionallsing Model 257 13.7 Bibliographic Notes 335 12.3.3 CoIl/lllJl-Oriented Parallel 9.3.3 Function MP l __Wai t 225 10.5.4 Room Assignment Problem 259 Algorithm 295 13.8 Exercises 335 9.3.4 Workm" Con11l1lmications 225 10.5.5 Pmking Garage 262 Comparison 295 9.3.5 Function NPI,_Isend 225 10.5.6 Traffic Circle 264 12.4 Gaussian Elimination 296 14 9.3.6 Function Ml'I_Probe 225 10.6 Summruy 268 CHAPTER 12.4.1 Sequential Algorithm 296 Sorting 338 9.3.7 Function MPI.._Ge(_count 226 10.7 Key Terms 269 12.4.2 Parallel Algorithms 298 9.4 Documenting the Parallel Program 226 10.8 Bibliographic Notes 269 12.4.3 Row-Oriented AlgOlithm 299 14.1 flltwduction 338 9.5 Enhancements 232 10.9 Exercises 270 12.4.4 Column-Oriented Algorithm 303 14.2 Quicksort 339 x Contents Contents xi 14.3 340 16.3 Backtrack Search 371 17.4.2 412 18.3.2 ParalleliZing FUlIctioll 16.3.1 EXtll1lple 371 17.4.3 lastprivateClause 412 find_steady_state 444 17.5 Critical Sections 413 18.3.3 BenchmarkillQ 446 341 16.3.2 umeandSpace Complexity 374 14.3.3 16.4 Parallel Backtrack Search 374 17.5.1 criticalPragma 415 18.4 Summary 448 14.4 Hyperquicksort 16.5 Distributed Termination Detection 377 17.6 Reductions 415 18.5 Exercises 448 17.7 Performance Improvements 417 14.4.1 Algorithm Description 343 16.6 Branch and Bound 380 14.4.2 Isoefficiency AIUllysis 345 16.6.i F.xample 380 17.7.1 Inverting liJops 417 APPENDIX A 17.7.2 Conditionally ExeclIIing Loops 418 14.5 Parallel Sorting by Regular Sampling 346 16.62 Sequential Algorithm 382 MPI Functions 450 17.7.3 Scheduling Loops 419 145.1 Algorithm Description 346 16.6.3 Anll.lysis 385 17.8 MoreGeneralDataParallelism 421 14.52 IsoeJjir:iency Analysis 347 16.7 Parallel Branch and Bound 385 14.6 SummalY 349 16.7.1 SlOrinf! and Sharinf! Unexamined 17.B.1 parallel Pragma 422 APPfNDIXB 17.B2 FunctionolRP_get_ 14.7 Key Terms 349 Utility Functions 485 16.72 Efficiency .187 thread~'lum 42.1 14.8 Bibliographic Notes 350 16.7.3 Haltinfl Conditions .187 17.8.3 Function oIrIfJ_get_ B.1 Header File MyMPI. h 485 14.9 Exercises 350 nllm_tllreads 425 16.8 B.2 Source File 14yMPI. c 4&6 17.B.4 for Pragma 425 ­ 17.3.5 single Pragma 427 15 16.8.2 Alphll-Beta Prlming 392 CHAPTER 17.B.6 nwaiL Clause 427 APPENDIXC The Fast Fourier Transform 353 16.8.3 Enhancements to Alpha·Beta 17.9 Functional Parallelism 428 Debugging MPI Programs 505 Pruning 395 15.1 Introduction 353 17.9.1 parallel se'ctions Pragl/la 429 16.9 Parallel Alpha-Beta Search 395 C.1 Introduction 505 15.2 Fourier Analysis 353 17.9.2 section Pragma 429 16.9.1 Parallel Aspiratioll Search 396 C.2 Typical Bugs in MPI Programs 505 15.3 The Discrete Fourier Transform 355 16.9.2 Parallel Subtree Evaluation 396 17.9.3 sections Pragma 429 C.2.1 Bug.v Resulting ill Deadlock 505 15.3.1 Inverse Discrete Fouriel' 16.9.3 Distributed Tree Search 397 17.10 430 C.2.2 Bug.v 506 Transform 357. 16.10 :\lImmnr" 17.11 C.2.3 15.3.2 16.11 17.12 Bibliographic Notes 432 Communicatio/1s 507 16.12 Bibliographic Notes 400 17.13 Exercises 433 C,3 Practical Dehugging 507 15.4 The Fast Fourier Transform 360 16.13 Exercises 401 15.5 ParallelPro!!Iam Desi.'O 363 I'arlitioning and Communication 363 CHAPT R 18 APPENDIXD 15.5.2 Agglomeration and Mappill,£ 365 17 Combining MPI and OpenMP 436 Review of Complex Numbers 509 CHAPTER 15.5.3 lsoefficiency Analysis 365 Shared-Memory Programming 404 18.1 Introduction 436 15.6 Summary 367 18.2 Conjugate Gradient Method 438 17.1 Introduction 404 E 15.7 Key Terms 367 APPENDIX IB.2.1 MPI Pmgram 438 15.8 Bibliographic Notes 367 17.2 The Shared-Memory Model 405 OpenMP Functions 513 1B.2.2 Funcliol1aJ Pro.#ling 442 17.3 Parallel for Loops 407 15.9 Exercises 367 1B.2.3 Parallelizing FUllction 17.3.1 parallel for Pragml! 408 matr'ix_vectoryroduct 442 17.32 Function Bibliography 515 1B.2.4 Benchmarking 443 CHAPTER 16 num-.procs 410 18.3 Jacobi Method 444 Author Index 520 17.3.3 FUllctiOf! Combinatorial Search 369 Jlillll_threads 410 IB.3.1 Profilinl! MP1 Proi?ram 444 Subject Index 16.1 Introduction 369 17.4 Declaring Private Variables 410 16.2 Divide and Conquer 370 17.4.1 pri vate Clause 411 Preiace xiii PREFACE Chapter 7 focuses on four metrics for analyzing and mance of parallel systems: Amdahl's Law, Gustafson-Barsis' and the isoefficiency metric. Chapters 10-16 provide additional examples of how to analyze a problem and design a good parallel algorithm to solve it. At this point the development of MPI programs implementing the parallel algorithms is left to the reader. I This book is a practical introduction to parallel programming in C using Monte Carlo methods and the challenges associated with parallel random number the MPI (Message Passing Interface) library and the OpenMP applica­ generation. Later chapters present a variety of key algorithms: matrix multipli- tion programming interface. It is targeted to upper-division unden~radu; Gaussian elimination, the conjugate gradient method, finite difference ate students, beginning graduate students, and computer professionals sorting, the fast Fourier transform, backtrack search, branch-and-bound this material on their own. It assumes the reader has a good background in C search, and alpha-beta search. progratnming and has had an introductory class in the analysis of algorithms. Chapters 17 and 18 are an introdl!ction to the new shared-memory program­ Fortran programmers interested in parallel programming can also benefit ming standard OpenMP. I present fue features of OpenMP as needed to convert from thi s text. While the examples in the book are in C, the underlying concepts sequential code segments into parallel ones. I use two case studies to demonstrate programming with MPI and OpenMP are essentially the same for both the process of transforming MPI programs into hybrid MPIIOpenMP prO'Jrams Cand Fortran programmers. that can exhibit higher performance on multiprocessor clusters than programs In the pa~t twenty years I have taught parallel programming to hundreds based solely on MPI. of undergraduate and graduate students. In the process I have learned a great This book has more than enough material for a one-semester course in par­ deal about the sorts of problems people encounter when they begin "thinking in allel programming. While parallel programming is more demanding than typical and writing parallel programs. Students benefit·from seeing programs programming, it is also more rewarding. Even with a teacher's instruction and implemented step by step. My philosophy is to introduce new func­ most students are unnerved at the prospect of harnessing multiple pro­ in time." As much as possible, every new concept appears in the cessors to perform a single task. However, this fear is transformed into a feeling a design, implementation, or analysis problem. When you see of genuine accomplishment when they see their debugged programs run much faster than "ordinary" C programs. For this reason, programming assignments o-r should playa central role in the course. F011unately, parallel computers are more accessible than ever. Ifacommercial in a page margin, you'll know I'm presenting a key computer is not available, it is a straightforward task to build a small Til(: two chapters explain when and why parallel computing began and cluster out of afew PCs, networking equipment, .and free software. gives a high-level overview of parallel architectures. Chapter 3 presents Foster's P.I illustrates the precedence relations among the chapters. A solid methodology and shows how it is used through several arrow from Ato B indicates chapter Bdepends heavily upon material presented ca~e studies.. Chapters 6,8, and 9demonstrate how to use the design method- in chapter A. A dashed arrow from A to B indicates a weak dependence. If to develop MPI programs that solve aseries of progressively more difficult you cover the chapters in numerical order, you will satisfy aU of these prece­ lmming problems. The 27 MPI functions presented in these chapters are a dences. However, if you would like your students to start programming in C with robust enough subset to implement parallel programs for a wide variety of MPI as quickly as possible, you may wish to skip Chapter 2 or only cover one cations. These chapters also introduce functions that simplify matrix and vector or two sections of it. If you wish to focus on numerical algorithms, you may 1/0. The source code for this 110 library appears in Appendix B. wish to skip 5 and introduce students to the function MPI_Bcast in The programs ofChapters 4, 5, 6, and 8have been benchmarked on acommod­ another way. If you would like to start by having your students programming ity cluster of microprocessors, and these results appear in the text. Because new Monte Carlo algorithms, you can jump to Chapter 10 immediately after Chapter generations of microprocessors appear much faster than books can be produced, 4. If you want to cover OpenMP before MPI, you can jump to Chaoter 17 after readers will observe that the processors are several generations old. The point of Chapter 3. presenting the result~ is not to amaze the reader with the speed of the computa­ I thank everyone at McGraw-Hill who helped me create this book, espe­ tions. Rather, the purpose of the benchmarking is to demonstrate that knowledge cially Betsy Jones, Michelle Flomenhoft, and Kay Brimeyer. Thank you for your of the and bandwidth of the interconnection network, combined with in­ sponsorship, encouragement, and assistance. I also appreciate the help provided formation ahout the performance of a sequential program, are often sufficient to by Maggie Murphy and the rest of the compositors at Interactive Composition allow reasonablv accurate oredictions of the performance of a parallel program. xii xiv Preface C HAP T E R ......... \ . , , , Motivation and History Well done is quickly done. Caesar Augustus Figure Pot Dependences among the chapters. Asolid arrow indicates a strong dependence; a dashed arrow indicates a weak dependence. . 1.1 INTRODUCTION . I am indebted to the reviewers who carefully read the manuscript, correcting errors, pointing out weak spots, and suggesting additional topics. My thanks Are you one of those people for whom "fast" isn', fast enough? Today's work­ to: A. P. W. Bohm, Colorado State University; Thomas Cormen, Dartmouth stations are about ahundred times faster than those made just a decade ago, but College; Narsingh Deo, University of Central Florida; Philip 1. Hatcher, some computational scientists and engineers need even more speed. They make great simplifications to the problems they are solving and.still must wait of New Hampshire; Nickolas S. Jovanovic, University of Arkansas at Little Rock; Dinesh Mehta, Colorado School of Mines; Zina Ben or even weeks for their programs to finish running. Indiana University-Purdue University, Indianapolis; Paul E. Plassman, Faster computers let yon tackle larger computations. Suppose you can afford Pennsylvania State University; Quinn O. Snell, Brigham Young University; to wait overnight for your program to produce aresult. If your program Ashok Srinivasan, Florida State University; Xian-He Sun, Illinois Institute of ran 10 times faster, previously out-of-reach computations would now be within Technology; Virgil Wallentine, Kansas State University; Bob Weems, Univer­ your grasp. You could produce in 15 hours an answer that previously required of Texas at Arlington; Kay Zemoudel, California State I fn;""",,;hLI:;'>n nearly a week to generate. Bernardino; and Iun Zhang, University of Kentucky. Of course, you could simply wait for CPUs to faster. In about five years people at Oregon State University also lent me a hand. Rubin Landau CPUs will be 10 times faster than they are today (a consequence of Moore's and Henri I amen helped me understand Monte Carlo algorithms and the detailed Law). On the other hand, if you can afford to wait five balance condition, respectively. Students Charles Sauerbier and Bernd Michael much of a hurTY! Parallel computing is a proven way to get Kelm suggested questions that made their way into the text. Tim Budd showed now. me how to incorporate PostScript figures into LaTeX documents. Jalal Haddad What's parallel com[lllling? '" provided technical support. Thank you for your help! Parallel computing is the use ofaparallel computer to reduce the time needed ,......0 I am grateful to my wife, Victoria, for encouraging me to back to solve a single computational problem. Parallel computing is now considered a into textbook writing. Thanks for the inspiring Christmas present: Chicken Soup standard way for computational scientistsand engineers to solve problems inareas Writer's Soul: Stories to Open the Heart and Rekindle the Spirit ofWriters. as diverse as galactic evolution, climate modeling, aircraft desie:n. and molecular Michael J. Quinn dynamics. Oregon 2 CHAPTER 1 Motivation and History SECTION 1.2 Modern Scientific Method 3 What's aparallel compuur? to interact. In Chapter 18 you'll see an example of how a hybrid MPIIOpenMP o--r program can outperform an MPI-only program on apractical application. A parallel computer is a multiple-processor computer system supporting By working thwugb this book, you'11 learn alittle bit about parallel computer programming. Two important categories of parallel computers are multi­ hardware and a lot about parallel program development strategies. That includes computers and centralized multiprocessors parallel algorithm design and analysis, program implementation and debugging, As its name implies, a multicomputer is a parallel computer constructed and ways to benchmark and optimize your programs. out of multiple computers and an interconnection network. The processors on different computers interact by passing messages to each other. In contrast, acentralized mUltiprocessor (also called asymmetrical multi· processor Of SMP) is a more bigbly integrated system in which all CPUs share 1.2 MODERN SCIENTIFIC METHOD access to a global memory. This shared memory SUDDorts communication and synchronization among processors. Classical science is based on observation, theory, and physical experimentation. We'll study centralized mUltiprocessors, multicomputers, and other parallel Observation of a phenomenon leads to a hypothesis. The scientist develoDs a computer architectures in Chapter 2. to explain the phenomenon ,md designs an experiment to test that the results of the experiment require the scientist to refine the theorv. if What:S parallel programming? o--r not completely reject it. Here, observation may again take center stage. Parallel programming is programming in a language that allows you to Classical science is characterized by physical experiments and models. For explicitly indicate how~ different portions of the computation may be executed example, ~any physics students have explored the relationship between mass, concun'ently by different processors. We'll ctiscuss various kinds of parallel pro­ force, and ~cceleration using paper tape, pucks, and air tables. Physical exper­ languages in more detail near the end of this chapter. iments allow scientists to test theories. snch as Newton's first law of Is parallelprogramming really nece,~sary? against _ . In contrast, contemporary science is characterized by observation, theory, Alot ofresearch has been invested in the development of compiler technology experimentation, and numerical simulation (Figure LI). Numerical simulation is that would allow orctinary Fortran 77 or C programs to be translated into codes an increasingly important tool for scientists, who often cannot use physical ex­ that would execute with good efficiency on parallel computers with large numbers periments to test theories because they may be too expensive or time-consuming, of processors. This is a very difficult pro.blem, and while many experimental because they may be unethical, or because they may be impossible to perfo~m. . paralleJizingl compilers have been developed, at the present time commercial The modem scientist compares the behavior of a numerical simulation, which systems are still in their infancy. Tbe alternative is for you to write your own programs. Wily should I program using MPI and OpenMP? MPI (Message Passing Interface) is a standard specification passing libraries. Libraries meeting the standard are available on vutually every computer system. Free libraries are also available in case you want to /Observation run MPI on a network of workstations or a parallel computer built out of com­ ( modity component~ (PCs and switches). If you develop programs using you will be able to reuse them when you get access to a newer, faster parallel Numerical Phy~ical Theory computer. simulation experimentation I Increasingly. parallel computers are constructed out of symmetrical Within each S~P, the CPUs have ashared address space. While MPI is aperfectly satisfactory way for processors in different SMPs to communi­ cate with each other. OpenMP is a better way for processors within asingle SMP Figure 1.1 The introduction of numerical simulation distinguishes the contemporary scientific method from 'n.,.,.l1p!i.. verb: to make parallel. the classical scientific method. 4 CHAPTER 1 Motivation and History SECTION 1.4 Modern Parallel Computers 5 implements the theory, to data collected from nature. The differences cause the businesses, quicker computations lead to a competitive advantage. More rapid scientist to revise the thcory and/or make more obsefllations. crash simulations can reduce !he time an automaker needs to design a new car. Many important scientific problems are so complcx that solving them via Faster drug design can increase the number of patents held by a pharmaceutical numerical simulation requires extraordinarily powerful computers. These com­ finn. High-speed computers haveeven been used to design products as mundane problems, often called grand chaUenges for science, fall into several as disposable diapers! ~m~"ori~~ (73]: Computing speeds have risen dramatically in the could perform abollt 350 multiplications per second. Today's supercomputers 1. Quantum chemistry, statistical mechanics, and relativistic physics are more than a billion times faster, able to perfonn trillions of floating point 2. Cosmology and astrophysics operations per second. 3. Computational fluid dynamics and turblllf'nl'P Single processors are about amillion times faster than they were 50 years ago. 4. Materials design and supercondouctivity Most ofthe speed increase is due to higherclock rates that enable asingleoperation 5. Biology, pharmacology, genome sequencing, genetic engineering, to be performed more quickly. The remaining speed increase is due to greater enzyme activity, and cell modeling system concurrency: allowing the system to work simultaneously on operations. The history of computing has been marked by rapid progress on both 6. Medicine, and modeling of human organs and bones these fronts, asexempliticd by contemporary high-perfonnance microprocessors. 7. Global weather and environmental modeling Intel's Pentium 4CPU, for example, has clock speeds well in excess of I GHz, two While grand challenge problems emerged in the late 1980s as astimulus for. arithmetic-logic units (ALUs) clocked at twice the core processor clock speed, and further developments in high-perfOlmance computing, you can view the entire extensive hflIdware support for out-of-order speculative execution ofinstructions. of electronic computing as the {Iuest for higher performance. How cim today's supercomputers be a billion times faster than the ENIAC, if individual processors are only about amillion times faster? The answer is the remaining thousand-fold speed increase is achieved by collecting thousand, of processors into an integrated system capable of solving individual 1.3 EVOLUTION OF SUPERCOMPUTING faster than a single CPU; i.e., a parallel computer. The United States government has played a key role in the development and use The meaning ofthe word supercomputer, then, has changed overtime. In 1976 of high-performance computers. During World War II the U.S. supercomputer meant aCray-l, asingle-CPU computer with ahigh-performance construction of the ENIAC in order to speed the calculation of artillery tables. pipelined vector processor connected to a high-performance memory system. In the 30 years after World War n, the U.S. government used high-perfonnance Today, supercomputer means a parallel computer with thousand., of CPUs. computers to design nuclear weapons, break codes, and perform other national The invention of the microprocessor is a watershed event that led to the security-related applications. demise of traditional minicomputers and mainframes and spurred the Supercomputers are the most powerful computers that can be buill ment of low-cost parallel computers. Since the rnid-1980s, microprocessor manu­ (As computer speeds increase, the bar for "supercomputer" statns Jiscs, too.) Thc facturers have improved the perfonnance of their top-end processors at an annual tenn supercomputer first came into widespread use with the introduction of the rate of 50 percent while keeping prices more or less constant [90]. The rapid in­ Cray-l supercomputer in 1976. The Cray-I was a pipelined vector processor, crease in microprocessor speeds has completely changed the face of computing. not a multiple-processor computer, but it was capable of more than 100 million Microprocessor-based sefllers now fill the role formerly playedby minicomputers point operations per second. cOllstmcted out of gate arrays or off-the-shelf-logic. Even mainframe I'f)mnlltprc Supercomputers have typically cost $10 million or more. The high cost of are beiM. constructed out of microprocessors. supercomputers once meant they were found almost exclusively in government research facilities, such as Los Alamos National Laboratory Over time, however, supercomputers began to appear outside of government 1.4 MODERN PARALLEL COMPUTERS facilities. In the late 1970s supercomputers showed up in capital-intensive indus· " tries. Petroleum companies harnessed supercomputers to help them look for oil, Pamllel computers only became attractive to a wide range of customers with the ,..-() and automobile manufacturers started using these systems to imvrove the fuel advent of Very Large Scale Integration (VLSI) in the late 1970s. Supercomputers efficiency and safety such as the Cray-l were far too expensive for most organizations. Experimen· Ten years later, hundreds of corporations around the globe were using super­ tal mU'alIel computers were less expensive than supercomputers, but they were computers to support their business enterprises. The reason is simple: for many rpl~tivpl" costly. They were unreliable, to bool VLSI technology allowed

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.