Introduction to Reliable Distributed Programming Rachid Guerraoui · Luís Rodrigues Introduction to Reliable Distributed Programming With31Figures 123 Authors RachidGuerraoui ÉcolePolytechniqueFédérale deLausanne(EPFL) FacultéInformatiqueetCommunications LaboratoireProgrammationDistribué(LPD) Station14 1015Lausanne,Switzerland Rachid.Guerraoui@epfl.ch LuísRodrigues UniversidadeLisboa FaculdadedeCiˆencias DepartamentodeInformática BlocoC6,CampoGrande 1749-016Lisboa,Portugal [email protected] LibraryofCongressControlNumber:2006920522 ACMComputingClassification(1998):C.2,F.2,G.2 ISBN-10 3-540-28845-7 SpringerBerlinHeidelbergNewYork ISBN-13 978-3-540-28845-9 SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting, reproductiononmicrofilmorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9, 1965,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violations areliableforprosecutionundertheGermanCopyrightLaw. SpringerisapartofSpringerScience+BusinessMedia springer.com ©Springer-VerlagBerlinHeidelberg2006 PrintedinGermany Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnot imply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantpro- tectivelawsandregulationsandthereforefreeforgeneraluse. Typesetbytheauthors Production:LE-TEXJelonek,Schmidt&VöcklerGbR,Leipzig Coverdesign:KünkelLopkaWerbeagentur,Heidelberg Printedonacid-freepaper 45/3100/YL-543210 To May, Maria and Sarah. To Hugo and Sara. Preface This book aims at offering an introductory description of distributed pro- gramming abstractions and of the algorithms that are used to implement them in different distributed environments. The reader is provided with an insight into important problems in distributed computing, knowledge about the main algorithmic techniques that can be used to solve these problems, and examples of how to apply these techniques when building distributed applications. Content In modern computing, a program usually executes on several processes: in this context, a process is an abstraction that may represent a computer, a processor within a computer, or simply a specific thread of execution within aprocessor.Thefundamentalproblemindevisingsuchdistributedprograms usually consists in having the processes cooperate on some common task. Of course, traditional centralized algorithmic issues, on each process individu- ally, still need to be dealt with. The added difficulty here is in achieving a robust form of cooperation, despite failures or disconnections of some of the processes, inherent to most distributed environments. Were no notion of cooperation required, a distributed program would simply consist of a set of detached centralized programs, each running on a specific process, and little benefit could be obtained from the availability of several machines in a distributed environment. It was the need for coopera- tion that revealed many of the fascinating problems addressed by this book, problemsthatneedtobesolvedtomakedistributedcomputingareality.The book, not only exposes the reader to these problems but also presents ways to solve them in different contexts. Not surprisingly, distributed programming can be significantly simplified if the difficulty of robust cooperationis encapsulated within specific abstrac- tions. By encapsulating all the tricky algorithmic issues, such distributed programming abstractions bridge the gap between network communication layers, usually frugal in terms of reliability guarantees, and distributed ap- plication layers,usually demanding in terms of reliability. VIII Preface The book presentsvariousdistributed programmingabstractionsandde- scribes algorithmsthatimplement these abstractions.Ina sense,we givethe distributed applicationprogrammeralibraryof abstractioninterfacespecifi- cations, and give the distributed system builder a library of algorithms that implement the specifications. A significant amount of the preparation time of this book was devoted to preparing the exercises and working out their solutions. We strongly en- courage the reader to work out the exercises. We believe that no reasonable understanding can be achieved in a passive way. This is especially true in the fieldofdistributed computing wherea possible underlying anthropomor- phismmayprovideeasybutwrongintuitions.Manyexercisesarerathereasy and can be discussed within anundergraduate teaching classroom.Some ex- ercises are more difficult and need more time. These can typically be given as homeworks. The book comes with a companionset of running examples implemented in the Java programming language, using the Appia protocol composition framework. These examples can be used by students to get a better under- standingofmanyimplementationdetailsthatarenotcoveredinthehigh-level description of the algorithms given in the core of the chapters. Understand- ing such details usually makes a big difference when moving to a practical environment. Instructors can use the protocol layers as a basis for practical experimentations, by suggesting to students to perform optimizations of the protocols already given in the framework, to implement variations of these protocols for different system models, or to develop application prototypes that make use of the protocols. Presentation The book is written in a self-contained manner. This has been made pos- sible because the field of distributed algorithms has reached a certain level of maturity where details, for instance, about the network and various kinds of failures, can be abstracted away when reasoning about the distributed algorithms. Elementary notions of algorithms, first order logics, program- ming languages, networking, and operating systems might be helpful, but we believe that most of our abstractionspecifications and algorithms can be understood with minimal knowledge about these notions. The book follows an incremental approach and was primarily built as a textbook for teaching at the undergraduate or basic graduate level. It intro- duces basic elements of distributed computing in an intuitive manner and builds sophisticated distributed programming abstractions on top of more primitive ones. Whenever we devise algorithms to implement a given ab- straction, we consider a simple distributed system model first, and then we revisitthealgorithmsinmorechallengingmodels.Inotherwords,wefirstde- Preface IX vise algorithmsbymakingstrongsimplifyingassumptionsonthe distributed environment, and then we discuss how to weaken those assumptions. We have tried to balance intuition and presentation simplicity, on the one hand, with rigor, on the other hand. Sometimes rigor was affected, and this might not have been always on purpose. The focus here is rather on ab- straction specifications and algorithms, not on calculability and complexity. Indeed, there is no theorem in this book. Correctness arguments are given with the aim of better understanding the algorithms: they are not formal correctnessproofs per se. In fact, we tried to avoidGreek letters andmathe- maticalnotations:referencesaregiventopapersandbookswithmoreformal treatments of some of the material presented here. Organization • In Chapter 1 we motivate the need for distributed programming abstrac- tions by discussing various applications that typically make use of such abstractions. The chapter also presents the programming notations used in the book to describe specifications and algorithms. • InChapter2wepresentdifferentkindsofassumptionsthatwewillbemak- ingabouttheunderlyingdistributedenvironment,i.e.,wepresentdifferent distributedsystemmodels.Basically,wedescribethebasicabstractionson whichmoresophisticatedonesarebuilt.Theseincludeprocessandcommu- nicationlink abstractions.This chaptermightbe consideredasareference throughout other chapters. Therestofthechaptersareeachdevotedtoonefamilyofrelatedabstractions, and to various algorithms implementing them. We will go from primitive abstractions, and then use them to build more sophisticated ones. • In Chapter 3 we introduce specific distributed programming abstractions: those related to the reliable delivery of messages that are broadcast to a group of processes. We cover here issues such as how to make sure that a message delivered by one process is delivered by all processes, despite the crash of the original sender process. • In Chapter 4 we discuss shared memory abstractions which encapsulate simple formsofdistributedstorageobjectswithread-writesemantics,e.g., files and register abstractions. We cover here issues like how to ensure that a value written (stored) within a set of processes is eventually read (retrieved) despite the crash of some of the processes. • In Chapter 5 we address the consensus abstraction through which a set of processes can decide on a common value, based on values each process initially proposed, despite the crash of some of the processes. • In Chapter 6 we consider variants of consensus such as atomic broadcast, terminating reliable broadcast,(non-blocking) atomic commitment, group membership, and view-synchronous communication. X Preface Thedistributedalgorithmswewillstudydiffernaturallyaccordingtotheac- tualabstractiontheyaimatimplementing,butalsoaccordingtotheassump- tions ontheunderlyingdistributedenvironment(we willalsosaydistributed systemmodel),i.e.,accordingtotheinitialabstractionstheytakeforgranted. Aspectssuchasthereliabilityofthelinks,thedegreeofsynchronyofthesys- tem, and whether a deterministic or a randomized (probabilistic) solution is sought have a fundamental impact on how the algorithm is designed. To give the reader an insight into how these parameters affect the algo- rithm design, the book includes several classes of algorithmic solutions to implement the same distributed programming abstractions for various dis- tributed system models. Covering all chapters, with their associated exercises, constitutes a full course in the field. Focusing on each chapter solely for the specifications of the abstractions and their underlying algorithms in their simplest form, i.e., for the simplest model of computation considered in the book (fail-stop), would constitute a shorter, more elementary course. Such a course could provide a nice companion to a more practice-oriented course possibly based on our protocol framework. References We have been exploring the world of distributed programming abstractions formorethanadecadenow.Thematerialofthisbookwasinfluencedbymany researchers in the field of distributed computing. A special mention is due to Leslie Lamport and Nancy Lynch for having posed fascinating problems in distributed computing, and to the Cornell school of reliable distributed computing,includingOzalpBabaoglu,KenBirman,KeithMarzullo,Robbert van Rennesse, Rick Schlicting, Fred Schneider, and Sam Toueg. Manyotherresearchershavedirectlyorindirectlyinspiredthematerialof this book.Wedidourbesttoreferencetheir workthroughoutthe text.Most chapters end with a historicalnote. This intends to provide hints for further readings, to trace the history of the concepts presented in the chapters, as wellastogivecreditstothosewhoinventedandworkedouttheconcepts.At the end of the book, we reference other books for further readings on other aspects of distributed computing. Acknowledgments We would like to express our deepest gratitude to our undergraduate and graduate students from Ecole Polytechnique F´ed´erale de Lausanne (EPFL) and the University of Lisboa (UL), for serving as reviewers of preliminary drafts of this book. Indeed, they had no choice and needed to prepare their Preface XI examsanyway!Buttheywereindulgenttowardthebugsandtyposthatcould be found in earlierversionsof the book as well as associatedslides,and they provided us with useful feedback. ParthaDutta, CorineHari,MichalKapalka,PetrKouznetsov,RonLevy, MaximeMonod,BastianPochon,andJesperSpring,graduatestudents from theSchoolofComputerandCommunicationSciencesofEPFL,FilipeArau´jo and Hugo Miranda, graduate students from the Distributed Algorithms and Network Protocol (DIALNP) group at the Departamento de Informa´tica da Faculdade de Ciˆencias da Universidade de Lisboa (UL), Leila Khalil and Robert Basmadjian, graduate students from the Lebanese University inBeirut,aswellasAliGhodsi,graduatestudentfromtheSwedishInstitute of ComputerScience (SICS) inStockholm,suggestedmany improvementsto the algorithms presented in the book. Severalimplementations for the “hands-on” part of the book were devel- oped by, or with the help of, Alexandre Pinto, a key member of the Appia team, complemented with inputs from several DIALNP team members and students,includingNunoCarvalho,MariaJoa˜oMonteiro,andLu´ısSardinha. Finally,wewouldliketothankallourcolleagueswhowerekindenoughto commentonearlierdraftsof this book.These include Felix Gaertner,Benoit Garbinato and Maarten van Steen. Rachid Guerraoui and Lu´ıs Rodrigues