Parallel Computing on Distributed Memory Multiprocessors NATO ASI Series Advanced Science InstitutesSeries AseriespresentingtheresultsofactivitiessponsoredbytheNATOScience Committee, whichaimsatthedisseminationofadvancedscientificand technologicalknowledge, witha viewtostrengtheninglinksbetweenscientific communities. TheSeries ispublished byan internationalboardofpublishersinconjunctionwith theNATOScientificAffairsDivision A Life Sciences Plenum PublishingCorporation B Physics Londonand NewYork C Mathematicaland KluwerAcademic Publishers Physical Sciences Dordrecht, Bostonand London o Behavioural and Social Sciences E Applied Sciences F Computerand Springer-Verlag Systems Sciences BerlinHeidelberg NewYork G Ecological Sciences London ParisTokyoHongKong H Cell Biology BarcelonaBudapest I Global Environmental Change NATo-pea DATABASE Theelectronic indextotheNATOASI Series providesfull bibliographical references(withkeywordsand/orabstracts)tomorethan30000contributions from internationalscientistspublished inallsectionsoftheNATOASI Series. AccesstotheNATO-PCODATABASEcompiled bytheNATOPublication Coordination Officeispossibleintwoways: -viaonlineFILE 128(NATO-PCODATABASE)hostedbyESRIN, ViaGalileo Galilei, 1-00044Frascati, Italy. -viaCD-ROM "NATO-PCODATABASE"with user-friendlyretrieval software in English, Frenchand German(©WTVGmbHand DATAWARE Technologies Inc. 1989). TheCD-ROMcan beorderedthroughanymemberofthe BoardofPublishersor through NATO-PCO, Overijse, Belgium. SeriesF: Computerand SystemsSciencesVol. 103 PPaarraalllleell CCoommppuuttiinngg oonn DDiissttrriibbuutteedd MMeemmoorryy MMuullttiipprroocceessssoorrss EEddiitteedd bbyy FFOOssuunn OOzzggOOnneerr DDeeppaarrttmmeenntt ooff EElleeccttrriiccaall EEnnggiinneeeerriinngg,, TThhee OOhhiioo SSttaattee UUnniivveerrssiittyy 220055 DDrreeeessee LLaabboorraattoorryy 22001155 NNeeiill AAvveennuuee,, CCoolluummbbuuss,, OOHH 4433221100--11227722,, UUSSAA FFiikkrreett EErrcc;;aall CCoommppuutteerr SScciieennccee DDeeppaarrttmmeenntt,, UUnniivveerrssiittyy ooff MMiissssoouurrii--RRoollllaa RRoollllaa,, MMOO 6655440011,, UUSSAA SSpprriinnggeerr--VVeerrllaagg BBeerrlliinn HHeeiiddeellbbeerrgg GGmmbbHH PPrroocceeeeddiinnggss ooff tthhee NNAATTOO AAddvvaanncceedd SSttuuddyy IInnssttiittuuttee oonn PPaarraalllleell CCoommppuuttiinngg oonn OOiissttrriibbuutteedd MMeemmoorryy MMuullttiipprroocceessssoorrss,, hheelldd aatt BBiillkkeenntt UUnniivveerrssiittyy,, AAnnkkaarraa,, TTuurrkkeeyy,, JJuullyy 11--1133,, 11999911 CCRR SSuubbjjeecctt CCllaassssiiffiiccaattiioonn ((11999911)):: 00..11..33,, CC..11..22,, GG..11..00,, 00..44..22 IISSBBNN 997788--33--664422--6633446600--44 IISSBBNN 997788--33--664422--5588006666--66 ((eeBBooookk)) DDOOII 1100..11000077//997788--33--664422--5588006666--66 TThhiiss wwoorrkk iiss ssuubbjjeecctt ttoo ccooppyyrriigghhtt.. AAIIII rriigghhttss aarree rreesseerrvveedd,, wwhheetthheerr tthhee wwhhoollee oorr ppaarrtt 0011 tthhee mmaatteerriiaall iiss ccoonncceerrnneedd,, ssppeecciilliiccaallllyy tthhee rriigghhttss 0011 ttrraannssllaattiioonn,, rreepprriinnttiinngg,, rreeuussee 0011 iilllluussttrraattiioonnss,, rreecciittaattiioonn,, bbrrooaaddccaassttiinngg,, rreepprroodduuccttiioonn oonn mmiiccrraalliillmmss oorr iinn aannyy ootthheerr wwaayy,, aanndd ssttoorraaggee iinn ddaattaa bbaannkkss.. DDuupplliiccaattiioonn 0011 tthhiiss ppuubblliiccaattiioonn oorr ppaarrttss tthheerreeooll iiss ppeerrmmiitttteedd oonnllyy uunnddeerr tthhee pprroovviissiioonnss 0011 tthhee GGeerrmmaann CCooppyyrriigghhtt LLaaww 0011 SSeepptteemmbbeerr 99,, 11996655,, iinn iittss ccuurrrreenntt vveerrssiioonn,, aanndd ppeerrmmiissssiioonn lloorr uussee mmuusstt aallwwaayyss bbee oobbttaaiinneedd IIrraamm SSpprriinnggeerr--VVeerrllaagg.. VViioollaattiioonnss aarree lliiaabbllee lloorr pprroosseeccuuttiioonn uunnddeerr tthhee GGeerrmmaann CCooppyyrriigghhtt LLaaww.. ©© SSpprriinnggeerr--VVeerrllaagg BBeerrlliinn HHeeiiddeellbbeerrgg 11999933 SSooffttccoovveerr rreepprriinntt ooff tthhee hhaarrddccoovveerr 11ss tt eeddiittiioonn 11999933 TTyyppeesseettttiinngg:: CCaammeerraa rreeaaddyy bbyy aauutthhoorrss 4455//33114400 --55 443322 11 00--PPrriinntteedd oonn aacciidd--IIrreeee ppaappeerr Preface Thecomputationaldemands ofmanycomplexscientificand engineeringproblemscannot be met by a single traditional computer. The advances in microelectronics technology have made massively parallel computing a reality and triggered an outburst of research activity in both parallel processing architectures and algorithms. The class of parallel computersreferred to as distributed memory multiprocessors,that consist ofmicroproces sors interconnected in a regular topology, have been commerciallyavailable in the recent years and are increasingly being used to solve large problems in many application areas. The processors insuch a systemhave theirown local memoriesand coordinatetheircom putations and share data by sending/receiving messages. With the continuing increase in the performance ofmicroprocessors, massively parallel computing offers the potential for solving very large problems that could not even be solved a few years ago. However, in order to use these general purpose computers for a specific application, existing algo rithmsneedtoberestructuredfor thearchitectureand newalgorithmsdeveloped. Infact, conventional algorithms need to be reexamined, since the best algorithm for a sequential computermay not bethebest for a parallelcomputer. Thus theperformanceofa specific computationon a distributed memorymultiprocessorisaffected bythenodeand commu nication architecture, the interconnection network topology, the I/O subsystem, and the parallel algorithm and communication protocols. Each of these parameters is a complex problem in itselfand solutions require an understanding ofthe interactions among them. This book is based on the papers presented at the NATO Advanced Study Institute on Parallel Computing on Distributed Memory Multiprocessorsheld at BilkentUniversity, Ankara, Turkey from July 1 to July 13, 1991. The book is organized in five parts. Part Iconsistsofpapers addressing parallelcomputerstructures, configurations and data communication mechanisms. The parallel algorithm designer is faced with the complex task of restructuring the computations to distribute equally among the processors, in a way to minimize the interprocessor communication overhead. Parts II and V address parallelalgorithms. Numericalalgorithmsarepresentedin Part IIand parallelalgorithms including those for image processing and database operations are presented in Part V. Papers dealing with parallel programming issues are collected in Part III. Finally, fault tolerance (the ability to continue operation in the presence of failures) capabilities of hypercubes, hardware redundancy methods and automatic support for fault tolerance in a distributed system are discussed in Part IV. VI We would like to thank our co-organizers Prof. Cevdet Aykanat for the local orga nization and Prof. Ozalp Babaoglu for his help in making this meeting possible. We would like to express our gratitute to Prof. Mithat Qoruh, president of Bilkent Univer sity and Prof. Mehmet Baray, chairman of the Computer Engineering and Information Science Department, for their continuous support during the institute and for providing the facilities. Finally, we thank all Bilkent University students who helped during the meeting and the Ohio State University students and Universityof Missouri-Rolla staff, who helped with theorganization. Specialthanks go to AninSolak, OzlemOzge,Tahsin Kurc;, TevfikBultan,Tunc;Akman,Shobana Balakrishnan, BabackIzadiand Randie Gay. July 1992 Fiisun Ozgiiner Fikret Ercal Contents I. Parallel Computing Structures and Communication Mechanisms for Parallel Computers . 3 William J. Dally, D. Scott Wills, and Richard Lethin Reconfigurable Mesh Algorithms For Fundamental DataManipulation Operations 27 Jing-Fu Jenq and Sartaj Sahni Spanning Trees and Communication Primitives on Hypercubes 47 Ching-Tien Ho The Effect ofConfigurations and Algorithms on Performance 77 Derek J. Paddon and Alan G. Chalmers Dedicated and General-Purpose Systems for Parallel Application Development 99 Antonino Mazzeo II. Parallel Numerical Algorithms Parallel Direct Solution ofSparse Linear Systems. 119 Kalluri Eswar, P. Sadayappan, and V. Visvanathan The PerformanceofLinear Algebra Algorithms on Intel Parallel Supercomputers . . .. 143 David S. Scott Sparse LU-Decomposition for Chemical Process Flowsheeting on a Multicomputer ... 151 Fikret Ercal, Neil L. Book, and Sinar Pait III.Parallel Programming Distributed Control Algorithms (Selected Topics) . . . . . . . . . . . . . . . . . . . . .. 167 Friedemann Mattern A Data-Driven Environment For A Multiprocessor System. . . . . . . . . . . . . . . .. 187 Jean-Luc Gaudiot Critical Path Length ofLarge Acyclic Task Graphs . . . . . . . . . . . . . . . . . . . .. 195 £1'01 Gelenbe VIII Logic ProgramExecution on Distributed Memory Parallel Computers . . . . . . . . .. 205 Mario Cannataro, Giandomenico Spezzano, and Domenico Talia IV. Fault Tolerance Tools and Techniques for Adding Fault Tolerance to Distributed and Parallel Programs 219 Ozalp Babaoglu Fault Tolerance in Hypercubes 233 Shobana Balakrishnan, Fii.sun Ozgii.nerand Baback Izadi V. Applications, Algorithms Parallel Relational Database Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .. 263 0ystein Torbjf1msen High Quality Image Synthesis on Distributed Memory Multiprocessors . . . . . . . . .. 283 Thierry Priol Parallel Implementation ofthe Backpropagation Algorithm on Hypercube Systems . .. 301 Cevdet Aykanat, Kemal Oflazer, and Radwan Tahboub Random Number Generation for Parallel Computers .................. 315 Srinivas Aluru and G. M. Pmbhu List ofParticipants 321 Subject Index .... 327 I. Parallel Computing Structures and Communication Mechanisms for Parallel Computers 1 William J. Dally*, D. Scott Willst, and Richard Lethin* *Artificial Intelligence Laboratory and Laboratory for Computer Science, Massachusetts InstituteofTechnology, Cambridge, Massachusetts02139 tGeorgiaInstituteofTechnology, DepartmentofElectrical Engineering, Atlanta, Georgia30332 Abstract: Most existing parallel computers have similar structure. However, they are specialized to a particular parallel programming model by a set ofspecial-purpose, hard wired mechanisms. We propose a set of generalized mechanisms for parallel computing that efficiently support most proposed parallel programming models. The mechanisms we propose are primitive hardware mechanisms from which more complex model specific mechanismscan bebuilt. Asend mechanismtransmits datato a remote nodeand option allyallocatesstorageand createsa process. Adatasynchronizationmechanismassociates synchronization state with each addressable word ofstorage and allows processes to syn chronizeon accesses to data. Namingis performed witha map mechanismthat associates virtual addresses with remote nodes or localstorage locations. Toefficientlysupport par allel programming models using these mechanisms, a processor must be able to switch processes and handle exceptions rapidly. In this paper, we describe these mechanisms, compare them to alternatives, and giveexamples of their use. Keywords: Parallel computers, concurrent computers, parallel processing, computer architecture, memory management, synchronization. 1 Introduction 1.1 Parallel Machines have Similar Structure but Different Mechanisms Toexploitthecost/performanceadvantageofparallelcomputing,severalclassesofparallel machineshaveemergedincludingshared memory multiprocessors [23, 19, 1], synchronous and asynchronous message-passing multicomputers [5, 6,4, 26], dataflow and reduction machines [36], and SIMD machines [20, 28]. Each of these classes is specialized for a particular parallel programming discipline or model of computation. For example, the IThe research described in this paper was supported in part by the Defense Advanced Research Projects Agency under contracts NOOOI4-88K-0738 and NOOOI4-87K-0825 and in part by a National Science Foundation Presidential Young Investigator Award, grant MIP-8657531, with matching funds from General Electric Corporation and IBM Corporation. Richard Lethin is supported by a fellowship from the John and FannieHertz Foundation.

