Table Of ContentParallel Computing on
Distributed Memory Multiprocessors
NATO ASI Series
Advanced Science InstitutesSeries
AseriespresentingtheresultsofactivitiessponsoredbytheNATOScience
Committee, whichaimsatthedisseminationofadvancedscientificand
technologicalknowledge, witha viewtostrengtheninglinksbetweenscientific
communities.
TheSeries ispublished byan internationalboardofpublishersinconjunctionwith
theNATOScientificAffairsDivision
A Life Sciences Plenum PublishingCorporation
B Physics Londonand NewYork
C Mathematicaland KluwerAcademic Publishers
Physical Sciences Dordrecht, Bostonand London
o
Behavioural and
Social Sciences
E Applied Sciences
F Computerand Springer-Verlag
Systems Sciences BerlinHeidelberg NewYork
G Ecological Sciences London ParisTokyoHongKong
H Cell Biology BarcelonaBudapest
I Global Environmental
Change
NATo-pea DATABASE
Theelectronic indextotheNATOASI Series providesfull bibliographical
references(withkeywordsand/orabstracts)tomorethan30000contributions
from internationalscientistspublished inallsectionsoftheNATOASI Series.
AccesstotheNATO-PCODATABASEcompiled bytheNATOPublication
Coordination Officeispossibleintwoways:
-viaonlineFILE 128(NATO-PCODATABASE)hostedbyESRIN, ViaGalileo
Galilei, 1-00044Frascati, Italy.
-viaCD-ROM "NATO-PCODATABASE"with user-friendlyretrieval software
in English, Frenchand German(©WTVGmbHand DATAWARE
Technologies Inc. 1989).
TheCD-ROMcan beorderedthroughanymemberofthe BoardofPublishersor
through NATO-PCO, Overijse, Belgium.
SeriesF: Computerand SystemsSciencesVol. 103
PPaarraalllleell CCoommppuuttiinngg
oonn DDiissttrriibbuutteedd MMeemmoorryy
MMuullttiipprroocceessssoorrss
EEddiitteedd bbyy
FFOOssuunn OOzzggOOnneerr
DDeeppaarrttmmeenntt ooff EElleeccttrriiccaall EEnnggiinneeeerriinngg,, TThhee OOhhiioo SSttaattee UUnniivveerrssiittyy
220055 DDrreeeessee LLaabboorraattoorryy
22001155 NNeeiill AAvveennuuee,, CCoolluummbbuuss,, OOHH 4433221100--11227722,, UUSSAA
FFiikkrreett EErrcc;;aall
CCoommppuutteerr SScciieennccee DDeeppaarrttmmeenntt,, UUnniivveerrssiittyy ooff MMiissssoouurrii--RRoollllaa
RRoollllaa,, MMOO 6655440011,, UUSSAA
SSpprriinnggeerr--VVeerrllaagg BBeerrlliinn HHeeiiddeellbbeerrgg GGmmbbHH
PPrroocceeeeddiinnggss ooff tthhee NNAATTOO AAddvvaanncceedd SSttuuddyy IInnssttiittuuttee oonn PPaarraalllleell CCoommppuuttiinngg oonn
OOiissttrriibbuutteedd MMeemmoorryy MMuullttiipprroocceessssoorrss,, hheelldd aatt BBiillkkeenntt UUnniivveerrssiittyy,, AAnnkkaarraa,, TTuurrkkeeyy,,
JJuullyy 11--1133,, 11999911
CCRR SSuubbjjeecctt CCllaassssiiffiiccaattiioonn ((11999911)):: 00..11..33,, CC..11..22,, GG..11..00,, 00..44..22
IISSBBNN 997788--33--664422--6633446600--44 IISSBBNN 997788--33--664422--5588006666--66 ((eeBBooookk))
DDOOII 1100..11000077//997788--33--664422--5588006666--66
TThhiiss wwoorrkk iiss ssuubbjjeecctt ttoo ccooppyyrriigghhtt.. AAIIII rriigghhttss aarree rreesseerrvveedd,, wwhheetthheerr tthhee wwhhoollee oorr ppaarrtt 0011 tthhee mmaatteerriiaall iiss ccoonncceerrnneedd,,
ssppeecciilliiccaallllyy tthhee rriigghhttss 0011 ttrraannssllaattiioonn,, rreepprriinnttiinngg,, rreeuussee 0011 iilllluussttrraattiioonnss,, rreecciittaattiioonn,, bbrrooaaddccaassttiinngg,, rreepprroodduuccttiioonn oonn
mmiiccrraalliillmmss oorr iinn aannyy ootthheerr wwaayy,, aanndd ssttoorraaggee iinn ddaattaa bbaannkkss.. DDuupplliiccaattiioonn 0011 tthhiiss ppuubblliiccaattiioonn oorr ppaarrttss tthheerreeooll iiss
ppeerrmmiitttteedd oonnllyy uunnddeerr tthhee pprroovviissiioonnss 0011 tthhee GGeerrmmaann CCooppyyrriigghhtt LLaaww 0011 SSeepptteemmbbeerr 99,, 11996655,, iinn iittss ccuurrrreenntt vveerrssiioonn,,
aanndd ppeerrmmiissssiioonn lloorr uussee mmuusstt aallwwaayyss bbee oobbttaaiinneedd IIrraamm SSpprriinnggeerr--VVeerrllaagg.. VViioollaattiioonnss aarree lliiaabbllee lloorr pprroosseeccuuttiioonn
uunnddeerr tthhee GGeerrmmaann CCooppyyrriigghhtt LLaaww..
©© SSpprriinnggeerr--VVeerrllaagg BBeerrlliinn HHeeiiddeellbbeerrgg 11999933
SSooffttccoovveerr rreepprriinntt ooff tthhee hhaarrddccoovveerr 11ss tt eeddiittiioonn 11999933
TTyyppeesseettttiinngg:: CCaammeerraa rreeaaddyy bbyy aauutthhoorrss
4455//33114400 --55 443322 11 00--PPrriinntteedd oonn aacciidd--IIrreeee ppaappeerr
Preface
Thecomputationaldemands ofmanycomplexscientificand engineeringproblemscannot
be met by a single traditional computer. The advances in microelectronics technology
have made massively parallel computing a reality and triggered an outburst of research
activity in both parallel processing architectures and algorithms. The class of parallel
computersreferred to as distributed memory multiprocessors,that consist ofmicroproces
sors interconnected in a regular topology, have been commerciallyavailable in the recent
years and are increasingly being used to solve large problems in many application areas.
The processors insuch a systemhave theirown local memoriesand coordinatetheircom
putations and share data by sending/receiving messages. With the continuing increase
in the performance ofmicroprocessors, massively parallel computing offers the potential
for solving very large problems that could not even be solved a few years ago. However,
in order to use these general purpose computers for a specific application, existing algo
rithmsneedtoberestructuredfor thearchitectureand newalgorithmsdeveloped. Infact,
conventional algorithms need to be reexamined, since the best algorithm for a sequential
computermay not bethebest for a parallelcomputer. Thus theperformanceofa specific
computationon a distributed memorymultiprocessorisaffected bythenodeand commu
nication architecture, the interconnection network topology, the I/O subsystem, and the
parallel algorithm and communication protocols. Each of these parameters is a complex
problem in itselfand solutions require an understanding ofthe interactions among them.
This book is based on the papers presented at the NATO Advanced Study Institute
on Parallel Computing on Distributed Memory Multiprocessorsheld at BilkentUniversity,
Ankara, Turkey from July 1 to July 13, 1991. The book is organized in five parts.
Part Iconsistsofpapers addressing parallelcomputerstructures, configurations and data
communication mechanisms. The parallel algorithm designer is faced with the complex
task of restructuring the computations to distribute equally among the processors, in
a way to minimize the interprocessor communication overhead. Parts II and V address
parallelalgorithms. Numericalalgorithmsarepresentedin Part IIand parallelalgorithms
including those for image processing and database operations are presented in Part V.
Papers dealing with parallel programming issues are collected in Part III. Finally, fault
tolerance (the ability to continue operation in the presence of failures) capabilities of
hypercubes, hardware redundancy methods and automatic support for fault tolerance in
a distributed system are discussed in Part IV.
VI
We would like to thank our co-organizers Prof. Cevdet Aykanat for the local orga
nization and Prof. Ozalp Babaoglu for his help in making this meeting possible. We
would like to express our gratitute to Prof. Mithat Qoruh, president of Bilkent Univer
sity and Prof. Mehmet Baray, chairman of the Computer Engineering and Information
Science Department, for their continuous support during the institute and for providing
the facilities. Finally, we thank all Bilkent University students who helped during the
meeting and the Ohio State University students and Universityof Missouri-Rolla staff,
who helped with theorganization. Specialthanks go to AninSolak, OzlemOzge,Tahsin
Kurc;, TevfikBultan,Tunc;Akman,Shobana Balakrishnan, BabackIzadiand Randie Gay.
July 1992 Fiisun Ozgiiner
Fikret Ercal
Contents
I. Parallel Computing Structures and Communication
Mechanisms for Parallel Computers . 3
William J. Dally, D. Scott Wills, and Richard Lethin
Reconfigurable Mesh Algorithms For Fundamental DataManipulation Operations 27
Jing-Fu Jenq and Sartaj Sahni
Spanning Trees and Communication Primitives on Hypercubes 47
Ching-Tien Ho
The Effect ofConfigurations and Algorithms on Performance 77
Derek J. Paddon and Alan G. Chalmers
Dedicated and General-Purpose Systems for Parallel Application Development 99
Antonino Mazzeo
II. Parallel Numerical Algorithms
Parallel Direct Solution ofSparse Linear Systems. 119
Kalluri Eswar, P. Sadayappan, and V. Visvanathan
The PerformanceofLinear Algebra Algorithms on Intel Parallel Supercomputers . . .. 143
David S. Scott
Sparse LU-Decomposition for Chemical Process Flowsheeting on a Multicomputer ... 151
Fikret Ercal, Neil L. Book, and Sinar Pait
III.Parallel Programming
Distributed Control Algorithms (Selected Topics) . . . . . . . . . . . . . . . . . . . . .. 167
Friedemann Mattern
A Data-Driven Environment For A Multiprocessor System. . . . . . . . . . . . . . . .. 187
Jean-Luc Gaudiot
Critical Path Length ofLarge Acyclic Task Graphs . . . . . . . . . . . . . . . . . . . .. 195
£1'01 Gelenbe
VIII
Logic ProgramExecution on Distributed Memory Parallel Computers . . . . . . . . .. 205
Mario Cannataro, Giandomenico Spezzano, and Domenico Talia
IV. Fault Tolerance
Tools and Techniques for Adding Fault Tolerance to Distributed and Parallel Programs 219
Ozalp Babaoglu
Fault Tolerance in Hypercubes 233
Shobana Balakrishnan, Fii.sun Ozgii.nerand Baback Izadi
V. Applications, Algorithms
Parallel Relational Database Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .. 263
0ystein Torbjf1msen
High Quality Image Synthesis on Distributed Memory Multiprocessors . . . . . . . . .. 283
Thierry Priol
Parallel Implementation ofthe Backpropagation Algorithm on Hypercube Systems . .. 301
Cevdet Aykanat, Kemal Oflazer, and Radwan Tahboub
Random Number Generation for Parallel Computers .................. 315
Srinivas Aluru and G. M. Pmbhu
List ofParticipants 321
Subject Index .... 327
I. Parallel Computing Structures and Communication
Mechanisms for Parallel Computers
1
William J. Dally*, D. Scott Willst, and Richard Lethin*
*Artificial Intelligence Laboratory and Laboratory for Computer Science, Massachusetts
InstituteofTechnology, Cambridge, Massachusetts02139
tGeorgiaInstituteofTechnology, DepartmentofElectrical Engineering, Atlanta, Georgia30332
Abstract: Most existing parallel computers have similar structure. However, they are
specialized to a particular parallel programming model by a set ofspecial-purpose, hard
wired mechanisms. We propose a set of generalized mechanisms for parallel computing
that efficiently support most proposed parallel programming models. The mechanisms
we propose are primitive hardware mechanisms from which more complex model specific
mechanismscan bebuilt. Asend mechanismtransmits datato a remote nodeand option
allyallocatesstorageand createsa process. Adatasynchronizationmechanismassociates
synchronization state with each addressable word ofstorage and allows processes to syn
chronizeon accesses to data. Namingis performed witha map mechanismthat associates
virtual addresses with remote nodes or localstorage locations. Toefficientlysupport par
allel programming models using these mechanisms, a processor must be able to switch
processes and handle exceptions rapidly. In this paper, we describe these mechanisms,
compare them to alternatives, and giveexamples of their use.
Keywords: Parallel computers, concurrent computers, parallel processing, computer
architecture, memory management, synchronization.
1 Introduction
1.1 Parallel Machines have Similar Structure but Different Mechanisms
Toexploitthecost/performanceadvantageofparallelcomputing,severalclassesofparallel
machineshaveemergedincludingshared memory multiprocessors [23, 19, 1], synchronous
and asynchronous message-passing multicomputers [5, 6,4, 26], dataflow and reduction
machines [36], and SIMD machines [20, 28]. Each of these classes is specialized for a
particular parallel programming discipline or model of computation. For example, the
IThe research described in this paper was supported in part by the Defense Advanced Research
Projects Agency under contracts NOOOI4-88K-0738 and NOOOI4-87K-0825 and in part by a National
Science Foundation Presidential Young Investigator Award, grant MIP-8657531, with matching funds
from General Electric Corporation and IBM Corporation. Richard Lethin is supported by a fellowship
from the John and FannieHertz Foundation.