ebook img

Using Advanced MPI: Modern Features of the Message-Passing Interface PDF

376 Pages·2014·4.669 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Using Advanced MPI: Modern Features of the Message-Passing Interface

Using Advanced MPI Scientific and Engineering Computation William Gropp and Ewing Lusk, editors; Janusz Kowalik, founding editor A complete list of books published in the Scientific and Engineering Computation series appears at the back of this book. Using Advanced MPI Modern Features of the Message-Passing Interface William Gropp Torsten Hoefler Rajeev Thakur Ewing Lusk The MIT Press Cambridge, Massachusetts London, England (cid:2)c 2014MassachusettsInstituteofTechnology Allrightsreserved. Nopartofthisbookmaybereproducedinanyformbyanyelectronicor mechanicalmeans(includingphotocopying,recording,orinformationstorageandretrieval) withoutpermissioninwritingfromthepublisher. ThisbookwassetinLATEXbytheauthorsandwasprintedandboundintheUnitedStatesof America. LibraryofCongressCataloging-in-PublicationData Gropp,William. UsingadvancedMPI:modernfeaturesoftheMessage-PassingInterface/WilliamGropp, TorstenHoefler,RajeevThakur,andEwingLusk. p.cm.—(Scientificandengineeringcomputation) Includesbibliographicalreferencesandindex. ISBN978-0-262-52763-7(pbk.: alk.paper) 1.Parallelprogramming(Computerscience)2.Parallelcomputers—Programming. 3.Computer interfaces. I.Hoefler,Torsten. II.Thakur,Rajeev. III.Lusk,Ewing. IV.Title. QA76.642.G758 2014 005.7’11—dc23 2014033725 10 9 8 7 6 5 4 3 2 1 To Clare Gropp, Natalia (Tasche), Pratibha and Sharad Thakur, and Brigid Lusk Contents Series Foreword xv Foreword xvii Preface xix 1 Introduction 1 1.1 MPI-1 and MPI-2 1 1.2 MPI-3 2 1.3 Parallelism and MPI 3 1.3.1 Conway’s Game of Life 4 1.3.2 Poisson Solver 5 1.4 Passing Hints to the MPI Implementation with MPI_Info 11 1.4.1 Motivation, Description, and Rationale 12 1.4.2 An Example from Parallel I/O 12 1.5 Organization of This Book 13 2 Working with Large-Scale Systems 15 2.1 Nonblocking Collectives 16 2.1.1 Example: 2-D FFT 16 2.1.2 Example: Five-Point Stencil 19 2.1.3 Matching, Completion, and Progression 20 2.1.4 Restrictions 22 2.1.5 Collective Software Pipelining 23 2.1.6 A Nonblocking Barrier? 27 2.1.7 Nonblocking Allreduce and Krylov Methods 30 2.2 Distributed Graph Topologies 31 2.2.1 Example: The Peterson Graph 37 2.2.2 Edge Weights 37 2.2.3 Graph Topology Info Argument 39 2.2.4 Process Reordering 39 2.3 Collective Operations on Process Topologies 40 2.3.1 Neighborhood Collectives 41 2.3.2 Vector Neighborhood Collectives 44 2.3.3 Nonblocking Neighborhood Collectives 45 viii Contents 2.4 Advanced Communicator Creation 48 2.4.1 Nonblocking Communicator Duplication 48 2.4.2 Noncollective Communicator Creation 50 3 Introduction to Remote Memory Operations 55 3.1 Introduction 57 3.2 Contrast with Message Passing 59 3.3 Memory Windows 62 3.3.1 Hints on Choosing Window Parameters 64 3.3.2 Relationship to Other Approaches 65 3.4 Moving Data 65 3.4.1 Reasons for Using Displacement Units 69 3.4.2 Cautions in Using Displacement Units 70 3.4.3 Displacement Sizes in Fortran 71 3.5 Completing RMA Data Transfers 71 3.6 Examples of RMA Operations 73 3.6.1 Mesh Ghost Cell Communication 74 3.6.2 Combining Communication and Computation 84 3.7 Pitfalls in Accessing Memory 88 3.7.1 Atomicity of Memory Operations 89 3.7.2 Memory Coherency 90 3.7.3 Some Simple Rules for RMA 91 3.7.4 Overlapping Windows 93 3.7.5 Compiler Optimizations 93 3.8 Performance Tuning for RMA Operations 95 3.8.1 Options for MPI_Win_create 95 3.8.2 Options for MPI_Win_fence 97 4 Advanced Remote Memory Access 101 4.1 Passive Target Synchronization 101 4.2 Implementing Blocking, Independent RMA Operations 102 4.3 Allocating Memory for MPI Windows 104 4.3.1 Using MPI_Alloc_mem and MPI_Win_allocate 104 from C Contents ix 4.3.2 Using MPI_Alloc_mem and MPI_Win_allocate 105 from Fortran 2008 4.3.3 Using MPI_ALLOC_MEM and MPI_WIN_ALLOCATE 107 from Older Fortran 4.4 Another Version of NXTVAL 108 4.4.1 The Nonblocking Lock 110 4.4.2 NXTVAL with MPI_Fetch_and_op 110 4.4.3 Window Attributes 112 4.5 An RMA Mutex 115 4.6 Global Arrays 120 4.6.1 Create and Free 122 4.6.2 Put and Get 124 4.6.3 Accumulate 127 4.6.4 The Rest of Global Arrays 128 4.7 A Better Mutex 130 4.8 Managing a Distributed Data Structure 131 4.8.1 A Shared-Memory Distributed List Implementation 132 4.8.2 An MPI Implementation of a Distributed List 135 4.8.3 Inserting into a Distributed List 140 4.8.4 An MPI Implementation of a Dynamic Distributed 143 List 4.8.5 Comments on More Concurrent List 145 Implementations 4.9 Compiler Optimization and Passive Targets 148 4.10 MPI RMA Memory Models 149 4.11 Scalable Synchronization 152 4.11.1 Exposure and Access Epochs 152 4.11.2 The Ghost-Point Exchange Revisited 153 4.11.3 Performance Optimizations for Scalable 155 Synchronization 4.12 Summary 156 5 Using Shared Memory with MPI 157 5.1 Using MPI Shared Memory 159 x Contents 5.1.1 Shared On-Node Data Structures 159 5.1.2 Communication through Shared Memory 160 5.1.3 Reducing the Number of Subdomains 163 5.2 Allocating Shared Memory 163 5.3 Address Calculation 165 6 Hybrid Programming 169 6.1 Background 169 6.2 Thread Basics and Issues 170 6.2.1 Thread Safety 171 6.2.2 Performance Issues with Threads 172 6.2.3 Threads and Processes 173 6.3 MPI and Threads 173 6.4 Yet Another Version of NXTVAL 176 6.5 Nonblocking Version of MPI_Comm_accept 178 6.6 Hybrid Programming with MPI 179 6.7 MPI Message and Thread-Safe Probe 182 7 Parallel I/O 187 7.1 Introduction 187 7.2 Using MPI for Simple I/O 187 7.2.1 Using Individual File Pointers 187 7.2.2 Using Explicit Offsets 191 7.2.3 Writing to a File 194 7.3 Noncontiguous Accesses and Collective I/O 195 7.3.1 Noncontiguous Accesses 195 7.3.2 Collective I/O 199 7.4 Accessing Arrays Stored in Files 203 7.4.1 Distributed Arrays 204 7.4.2 A Word of Warning about Darray 206 7.4.3 Subarray Datatype Constructor 207 7.4.4 Local Array with Ghost Area 210 7.4.5 Irregularly Distributed Arrays 211

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.