Table Of ContentRachid Guerraoui · Luís Rodrigues
Introduction to
Reliable Distributed
Programming
With31Figures
123
Authors
RachidGuerraoui
ÉcolePolytechniqueFédérale
deLausanne(EPFL)
FacultéInformatiqueetCommunications
LaboratoireProgrammationDistribué(LPD)
Station14
1015Lausanne,Switzerland
Rachid.Guerraoui@epfl.ch
LuísRodrigues
UniversidadeLisboa
FaculdadedeCiˆencias
DepartamentodeInformática
BlocoC6,CampoGrande
1749-016Lisboa,Portugal
ler@di.fc.ul.pt
LibraryofCongressControlNumber:2006920522
ACMComputingClassification(1998):C.2,F.2,G.2
ISBN-10 3-540-28845-7 SpringerBerlinHeidelbergNewYork
ISBN-13 978-3-540-28845-9 SpringerBerlinHeidelbergNewYork
Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis
concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting,
reproductiononmicrofilmorinanyotherway,andstorageindatabanks.Duplicationofthispublication
orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,
1965,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violations
areliableforprosecutionundertheGermanCopyrightLaw.
SpringerisapartofSpringerScience+BusinessMedia
springer.com
©Springer-VerlagBerlinHeidelberg2006
PrintedinGermany
Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnot
imply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantpro-
tectivelawsandregulationsandthereforefreeforgeneraluse.
Typesetbytheauthors
Production:LE-TEXJelonek,Schmidt&VöcklerGbR,Leipzig
Coverdesign:KünkelLopkaWerbeagentur,Heidelberg
Printedonacid-freepaper 45/3100/YL-543210
To May, Maria and Sarah.
To Hugo and Sara.
Preface
This book aims at offering an introductory description of distributed pro-
gramming abstractions and of the algorithms that are used to implement
them in different distributed environments. The reader is provided with an
insight into important problems in distributed computing, knowledge about
the main algorithmic techniques that can be used to solve these problems,
and examples of how to apply these techniques when building distributed
applications.
Content
In modern computing, a program usually executes on several processes: in
this context, a process is an abstraction that may represent a computer, a
processor within a computer, or simply a specific thread of execution within
aprocessor.Thefundamentalproblemindevisingsuchdistributedprograms
usually consists in having the processes cooperate on some common task. Of
course, traditional centralized algorithmic issues, on each process individu-
ally, still need to be dealt with. The added difficulty here is in achieving a
robust form of cooperation, despite failures or disconnections of some of the
processes, inherent to most distributed environments.
Were no notion of cooperation required, a distributed program would
simply consist of a set of detached centralized programs, each running on a
specific process, and little benefit could be obtained from the availability of
several machines in a distributed environment. It was the need for coopera-
tion that revealed many of the fascinating problems addressed by this book,
problemsthatneedtobesolvedtomakedistributedcomputingareality.The
book, not only exposes the reader to these problems but also presents ways
to solve them in different contexts.
Not surprisingly, distributed programming can be significantly simplified
if the difficulty of robust cooperationis encapsulated within specific abstrac-
tions. By encapsulating all the tricky algorithmic issues, such distributed
programming abstractions bridge the gap between network communication
layers, usually frugal in terms of reliability guarantees, and distributed ap-
plication layers,usually demanding in terms of reliability.
VIII Preface
The book presentsvariousdistributed programmingabstractionsandde-
scribes algorithmsthatimplement these abstractions.Ina sense,we givethe
distributed applicationprogrammeralibraryof abstractioninterfacespecifi-
cations, and give the distributed system builder a library of algorithms that
implement the specifications.
A significant amount of the preparation time of this book was devoted
to preparing the exercises and working out their solutions. We strongly en-
courage the reader to work out the exercises. We believe that no reasonable
understanding can be achieved in a passive way. This is especially true in
the fieldofdistributed computing wherea possible underlying anthropomor-
phismmayprovideeasybutwrongintuitions.Manyexercisesarerathereasy
and can be discussed within anundergraduate teaching classroom.Some ex-
ercises are more difficult and need more time. These can typically be given
as homeworks.
The book comes with a companionset of running examples implemented
in the Java programming language, using the Appia protocol composition
framework. These examples can be used by students to get a better under-
standingofmanyimplementationdetailsthatarenotcoveredinthehigh-level
description of the algorithms given in the core of the chapters. Understand-
ing such details usually makes a big difference when moving to a practical
environment. Instructors can use the protocol layers as a basis for practical
experimentations, by suggesting to students to perform optimizations of the
protocols already given in the framework, to implement variations of these
protocols for different system models, or to develop application prototypes
that make use of the protocols.
Presentation
The book is written in a self-contained manner. This has been made pos-
sible because the field of distributed algorithms has reached a certain level
of maturity where details, for instance, about the network and various kinds
of failures, can be abstracted away when reasoning about the distributed
algorithms. Elementary notions of algorithms, first order logics, program-
ming languages, networking, and operating systems might be helpful, but
we believe that most of our abstractionspecifications and algorithms can be
understood with minimal knowledge about these notions.
The book follows an incremental approach and was primarily built as a
textbook for teaching at the undergraduate or basic graduate level. It intro-
duces basic elements of distributed computing in an intuitive manner and
builds sophisticated distributed programming abstractions on top of more
primitive ones. Whenever we devise algorithms to implement a given ab-
straction, we consider a simple distributed system model first, and then we
revisitthealgorithmsinmorechallengingmodels.Inotherwords,wefirstde-
Preface IX
vise algorithmsbymakingstrongsimplifyingassumptionsonthe distributed
environment, and then we discuss how to weaken those assumptions.
We have tried to balance intuition and presentation simplicity, on the
one hand, with rigor, on the other hand. Sometimes rigor was affected, and
this might not have been always on purpose. The focus here is rather on ab-
straction specifications and algorithms, not on calculability and complexity.
Indeed, there is no theorem in this book. Correctness arguments are given
with the aim of better understanding the algorithms: they are not formal
correctnessproofs per se. In fact, we tried to avoidGreek letters andmathe-
maticalnotations:referencesaregiventopapersandbookswithmoreformal
treatments of some of the material presented here.
Organization
• In Chapter 1 we motivate the need for distributed programming abstrac-
tions by discussing various applications that typically make use of such
abstractions. The chapter also presents the programming notations used
in the book to describe specifications and algorithms.
• InChapter2wepresentdifferentkindsofassumptionsthatwewillbemak-
ingabouttheunderlyingdistributedenvironment,i.e.,wepresentdifferent
distributedsystemmodels.Basically,wedescribethebasicabstractionson
whichmoresophisticatedonesarebuilt.Theseincludeprocessandcommu-
nicationlink abstractions.This chaptermightbe consideredasareference
throughout other chapters.
Therestofthechaptersareeachdevotedtoonefamilyofrelatedabstractions,
and to various algorithms implementing them. We will go from primitive
abstractions, and then use them to build more sophisticated ones.
• In Chapter 3 we introduce specific distributed programming abstractions:
those related to the reliable delivery of messages that are broadcast to a
group of processes. We cover here issues such as how to make sure that a
message delivered by one process is delivered by all processes, despite the
crash of the original sender process.
• In Chapter 4 we discuss shared memory abstractions which encapsulate
simple formsofdistributedstorageobjectswithread-writesemantics,e.g.,
files and register abstractions. We cover here issues like how to ensure
that a value written (stored) within a set of processes is eventually read
(retrieved) despite the crash of some of the processes.
• In Chapter 5 we address the consensus abstraction through which a set
of processes can decide on a common value, based on values each process
initially proposed, despite the crash of some of the processes.
• In Chapter 6 we consider variants of consensus such as atomic broadcast,
terminating reliable broadcast,(non-blocking) atomic commitment, group
membership, and view-synchronous communication.
X Preface
Thedistributedalgorithmswewillstudydiffernaturallyaccordingtotheac-
tualabstractiontheyaimatimplementing,butalsoaccordingtotheassump-
tions ontheunderlyingdistributedenvironment(we willalsosaydistributed
systemmodel),i.e.,accordingtotheinitialabstractionstheytakeforgranted.
Aspectssuchasthereliabilityofthelinks,thedegreeofsynchronyofthesys-
tem, and whether a deterministic or a randomized (probabilistic) solution is
sought have a fundamental impact on how the algorithm is designed.
To give the reader an insight into how these parameters affect the algo-
rithm design, the book includes several classes of algorithmic solutions to
implement the same distributed programming abstractions for various dis-
tributed system models.
Covering all chapters, with their associated exercises, constitutes a full
course in the field. Focusing on each chapter solely for the specifications of
the abstractions and their underlying algorithms in their simplest form, i.e.,
for the simplest model of computation considered in the book (fail-stop),
would constitute a shorter, more elementary course. Such a course could
provide a nice companion to a more practice-oriented course possibly based
on our protocol framework.
References
We have been exploring the world of distributed programming abstractions
formorethanadecadenow.Thematerialofthisbookwasinfluencedbymany
researchers in the field of distributed computing. A special mention is due
to Leslie Lamport and Nancy Lynch for having posed fascinating problems
in distributed computing, and to the Cornell school of reliable distributed
computing,includingOzalpBabaoglu,KenBirman,KeithMarzullo,Robbert
van Rennesse, Rick Schlicting, Fred Schneider, and Sam Toueg.
Manyotherresearchershavedirectlyorindirectlyinspiredthematerialof
this book.Wedidourbesttoreferencetheir workthroughoutthe text.Most
chapters end with a historicalnote. This intends to provide hints for further
readings, to trace the history of the concepts presented in the chapters, as
wellastogivecreditstothosewhoinventedandworkedouttheconcepts.At
the end of the book, we reference other books for further readings on other
aspects of distributed computing.
Acknowledgments
We would like to express our deepest gratitude to our undergraduate and
graduate students from Ecole Polytechnique F´ed´erale de Lausanne (EPFL)
and the University of Lisboa (UL), for serving as reviewers of preliminary
drafts of this book. Indeed, they had no choice and needed to prepare their
Preface XI
examsanyway!Buttheywereindulgenttowardthebugsandtyposthatcould
be found in earlierversionsof the book as well as associatedslides,and they
provided us with useful feedback.
ParthaDutta, CorineHari,MichalKapalka,PetrKouznetsov,RonLevy,
MaximeMonod,BastianPochon,andJesperSpring,graduatestudents from
theSchoolofComputerandCommunicationSciencesofEPFL,FilipeArau´jo
and Hugo Miranda, graduate students from the Distributed Algorithms and
Network Protocol (DIALNP) group at the Departamento de Informa´tica
da Faculdade de Ciˆencias da Universidade de Lisboa (UL), Leila Khalil
and Robert Basmadjian, graduate students from the Lebanese University
inBeirut,aswellasAliGhodsi,graduatestudentfromtheSwedishInstitute
of ComputerScience (SICS) inStockholm,suggestedmany improvementsto
the algorithms presented in the book.
Severalimplementations for the “hands-on” part of the book were devel-
oped by, or with the help of, Alexandre Pinto, a key member of the Appia
team, complemented with inputs from several DIALNP team members and
students,includingNunoCarvalho,MariaJoa˜oMonteiro,andLu´ısSardinha.
Finally,wewouldliketothankallourcolleagueswhowerekindenoughto
commentonearlierdraftsof this book.These include Felix Gaertner,Benoit
Garbinato and Maarten van Steen.
Rachid Guerraoui and Lu´ıs Rodrigues
Contents
1. Introduction.............................................. 1
1.1 Motivation ............................................ 1
1.2 Distributed ProgrammingAbstractions.................... 3
1.2.1 Inherent Distribution ............................. 4
1.2.2 Distribution as an Artifact ........................ 6
1.3 The End-to-End Argument .............................. 7
1.4 Software Components ................................... 8
1.4.1 Composition Model............................... 8
1.4.2 Programming Interface............................ 10
1.4.3 Modules......................................... 11
1.4.4 Classes of Algorithms ............................. 13
1.5 Hands-On ............................................. 15
1.5.1 Print Module .................................... 16
1.5.2 BoundedPrint Module ............................ 18
1.5.3 Composing Modules .............................. 20
2. Basic Abstractions........................................ 25
2.1 Distributed Computation................................ 26
2.1.1 Processes and Messages ........................... 26
2.1.2 Automata and Steps.............................. 26
2.1.3 Liveness and Safety............................... 28
2.2 Abstracting Processes................................... 29
2.2.1 Process Failures.................................. 29
2.2.2 Arbitrary Faults and Omissions .................... 30
2.2.3 Crashes ......................................... 30
2.2.4 Recoveries....................................... 32
2.3 Abstracting Communication ............................. 34
2.3.1 Link Failures .................................... 35
2.3.2 Fair-Loss Links .................................. 36
2.3.3 Stubborn Links .................................. 36
2.3.4 Perfect Links .................................... 38
2.3.5 Logged Perfect Links ............................. 40
2.3.6 On the Link Abstractions ......................... 41
2.4 Timing Assumptions.................................... 43
XIV Contents
2.4.1 Asynchronous System............................. 43
2.4.2 Synchronous System .............................. 45
2.4.3 Partial Synchrony ................................ 46
2.5 Abstracting Time ...................................... 47
2.5.1 Failure Detection................................. 47
2.5.2 Perfect Failure Detection .......................... 48
2.5.3 Leader Election .................................. 50
2.5.4 Eventually Perfect Failure Detection ................ 51
2.5.5 Eventual Leader Election.......................... 54
2.6 Distributed System Models .............................. 58
2.6.1 Combining Abstractions........................... 58
2.6.2 Measuring Performance ........................... 59
2.7 Hands-On ............................................. 60
2.7.1 Sendable Event .................................. 60
2.7.2 Message and Extended Message .................... 61
2.7.3 Fair-Loss Point-to-PointLinks ..................... 62
2.7.4 Perfect Point-to-PointLinks ....................... 62
2.7.5 Perfect Failure Detector........................... 63
2.8 Exercises .............................................. 64
2.9 Solutions .............................................. 65
2.10 Historical Notes ........................................ 67
3. Reliable Broadcast........................................ 69
3.1 Motivation ............................................ 69
3.1.1 Client-Server Computing .......................... 69
3.1.2 Multi-participant Systems ......................... 70
3.2 Best-Effort Broadcast................................... 71
3.2.1 Specification..................................... 71
3.2.2 Fail-Silent Algorithm: Basic Broadcast .............. 71
3.3 Regular Reliable Broadcast .............................. 72
3.3.1 Specification..................................... 73
3.3.2 Fail-Stop Algorithm: Lazy Reliable Broadcast........ 73
3.3.3 Fail-Silent Algorithm: Eager Reliable Broadcast ...... 74
3.4 Uniform Reliable Broadcast.............................. 76
3.4.1 Specification..................................... 77
3.4.2 Fail-Stop Algorithm:
All-Ack Uniform Reliable Broadcast ................ 78
3.4.3 Fail-Silent Algorithm:
Majority-Ack Uniform Reliable Broadcast ........... 79
3.5 Stubborn Broadcast .................................... 81
3.5.1 Overview........................................ 81
3.5.2 Specification..................................... 81
3.5.3 Fail-Recovery Algorithm: Basic Stubborn Broadcast .. 82
3.6 Logged Best-Effort Broadcast ............................ 83
3.6.1 Specification..................................... 83