Michel Raynal Fault-Tolerant Message-Passing Distributed Systems An Algorithmic Approach Fault-Tolerant Message-Passing Distributed Systems Michel Raynal Fault-Tolerant Message-Passing Distributed Systems An Algorithmic Approach Michel Raynal IRISA-ISTIC Université de Rennes 1 Institut Universitaire de France Rennes, France Parts of this work are based on the books “Fault-Tolerant Agreement in Synchronous Message- Passing Systems” and “Communication and Agreement Abstractions for Fault-Tolerant Asynchro- nous Distributed Systems”, author Michel Raynal, © 2010 Morgan & Claypool Publishers (www. morganclaypool.com). Used with permission. ISBN 978-3-319-94140-0 ISBN 978-3-319-94141-7 (eBook) https://doi.org/10.1007/978-3-319-94141-7 Library of Congress Control Number: 2018953101 © Springer Nature Switzerland AG 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface LarecherchedutempsperdupassaitparleWeb.[...] Lame´moiree´taitdevenueine´puisable,maislaprofondeurdutemps[...]avaitdisparu. One´taitdansunpre´sentinfini. InLesanne´es(2008),AnnieErnaux(1940) Sednosimmensumspatiisconfecimusaequor, Etiamtempusequumfumentiasolverecolla.1 InGeorgica,LiberII,541-542,PubliusVirgilius(70BC–19BC) Jesuisarrive´aujourou`jenemesouviensplusquandj’aicesse´d’eˆtreimmortel. InLivrodeCro´nicas,Anto´nioLoboAntunes(1942) C’estunechosee´trangea`lafinquelemonde Unjourjem’eniraisansenavoirtoutdit. InLesyeuxetlame´moire(1954),chantII,LouisAragon(1897–1982) Toutgarder,c’esttoutde´truire. JacquesDerrida(1930–2004) 1French:Maisj’aide´ja`fourniunevastecarrie`re,ilesttempsdede´telerleschevauxtoutfumants. English:ButnowIhavetraveledaverylongway,andthetimehascometounyokemysteaminghorses. v vi Preface Whatisdistributedcomputing? Distributedcomputingwasborninthelate1970swhenresearchers andpractitionersstartedtakingintoaccounttheintrinsiccharacteristicofphysicallydistributedsys- tems. Thefieldthenemergedasaspecializedresearchareadistinctfromnetworking,operatingsys- tems,andparallelcomputing. Distributed computing arises when one has to solve a problem in terms of distributed entities (usuallycalledprocessors,nodes,processes,actors,agents,sensors,peers,etc.) suchthateachentity hasonlyapartialknowledgeofthemanyparametersinvolvedintheproblemthathastobesolved. While parallel computing and real-time computing can be characterized, respectively, by the terms efficiencyandon-timecomputing,distributedcomputingcanbecharacterizedbythetermuncertainty. Thisuncertaintyiscreatedbyasynchrony, multiplicityofcontrolflows, absenceofsharedmemory andglobaltime,failure,dynamicity, mobility,etc. Masteringoneformoranotherofuncertaintyis pervasiveinalldistributedcomputingproblems.Amaindifficultyindesigningdistributedalgorithms comes from the fact that no entity cooperating in the achievement of a common goal can have an instantaneousknowledgeofthecurrentstateoftheotherentities, itcanonlyknowtheirpastlocal states. Althoughdistributedalgorithmsareoftenmadeupofafewlines,theirbehaviorcanbedifficult tounderstandandtheirpropertieshardtostateandprove. Hence,distributedcomputingisnotonly afundamentaltopicbutalsoachallengingtopicwheresimplicity,elegance,andbeautyarefirst-class citizens. Whythisbook? Inthebook“Distributedalgorithmsformessage-passingsystems”(Springer,2013), Iaddresseddistributedcomputinginfailure-freemessage-passingsystems,wherethecomputingenti- ties(processes)havetocooperateinthepresenceofasynchrony.Differently,inmybook“Concurrent programming:algorithms,principlesandfoundations”(Springer,2013),Iaddresseddistributedcom- putingwherethecomputingentities(processes)communicatethrougharead/writesharedmemory (e.g., multicore), and the main adversary lies in the net effect of asynchrony and process crashes (unexpecteddefinitivestops). Thepresentbookconsiderssynchronousandasynchronousmessage-passingsystems,wherepro- cessescancommitcrashfailures,orByzantinefailures(arbitrarybehavior). Itsaimistopresentina comprehensivewaybasicnotions,conceptsandalgorithmsinthecontextofthesesystems.Themain difficultycomesfromtheuncertaintycreatedbytheadversariesmanagingtheenvironment(mainly asynchronyandfailures),which,byitsverynature,isnotunderthecontrolofthesystem. Aquicklookatthecontentofthebook Thebookiscomposedoffourparts,thefirsttwoareon communicationabstractions,theothertwoonagreementabstractions. Thosearethemostimportant abstractionsdistributedapplicationsrelyoninasynchronousandsynchronousmessage-passingsys- temswhereprocessesmaycrash,orcommitByzantinefailures.Thebookaddresseswhatcanbedone andwhatcannotbedoneinthepresenceofsuchadversaries. Itconsequentlypresentsbothimpossi- bilityresultsanddistributedalgorithms. Allimpossibilityresultsareproved,andallalgorithmsare describedinasimplealgorithmicnotationandprovedcorrect. • Partsoncommunicationabstractions. – PartIisonthereliablebroadcastabstraction. Preface vii – PartIIisontheconstructionofread/writeregisters. • Partsonagreement. – PartIIIisonagreementinsynchronoussystems. – PartIVisonagreementinasynchronoussystems. Onthepresentationstyle Whenknown,thenamesoftheauthorsofatheorem,orofanalgorithm, areindicatedtogetherwiththedateoftheassociatedpublication. Moreover,eachchapterhasabib- liographical section, where a short historical perspective and references related to that chapter are given. Eachchapterterminateswithafewexercisesandproblems,whosesolutionscanbefoundinthe articlecitedattheendofthecorrespondingexercise/problem. Fromavocabularypointofview,thefollowingtermsareused:anobjectimplementsanabstrac- tion,definedbyasetofproperties,whichallowsaproblemtobesolved. Moreover,eachalgorithm is first presented intuitively with words, and then proved correct. Understanding an algorithm is a two-stepprocess: • Firsthaveagoodintuitionofitsunderlyingprinciples,anditspossiblebehaviors. Thisisnec- essary,butremainsinformal. • Thenprovethealgorithmiscorrectinthemodelitwasdesignedfor. Theproofconsistsina logical reasoning, based on the properties provided by (i) the underlying model, and (ii) the statements(code)ofthealgorithm. Moreprecisely,eachpropertydefiningtheabstractionthe algorithmisassumedtoimplementmustbesatisfiedinallitsexecutions. Onlywhenthesetwostepshavebeendone,canwesaythatweunderstandthealgorithm. Audience Thisbookhasbeenwrittenprimarilyforpeoplewhoarenotfamiliarwiththetopicand theconceptsthatarepresented.Theseincludemainly: • Senior-levelundergraduatestudentsandgraduatestudentsininformaticsorcomputingengineer- ing,whoareinterestedintheprinciplesandalgorithmicfoundationsoffault-tolerantdistributed computing. • Practitionersandengineerswhowanttobeawareofthestate-of-the-artconcepts,basicprinci- ples,mechanisms,andtechniquesencounteredinfault-tolerantdistributedcomputing. Prerequisitesforthisbookincludeundergraduatecoursesonalgorithms,basicknowledgeonoperat- ingsystems,andnotionsonconcurrencyinfailure-freedistributedcomputing.One-semestercourses, basedonthisbook,aresuggestedinthesectiontitled“HowtoUseThisBook”intheAfterword. Originofthebookandacknowledgments Thisbookhastwocomplementaryorigins: • ThefirstisasetoflecturesforundergraduateandgraduatecoursesondistributedcomputingI gaveattheUniversityofRennes(France),theHongKongPolytechnicUniversity,and,asan invitedprofessor,atseveraluniversitiesallovertheworld. Hence,Iwanttothankthenumerousstudentsfortheirquestionsthat,inonewayoranother, contributedtothisbook. • The second is the two monographs I wrote in 2010, on fault-tolerant distributed computing, titled“Communicationandagreementabstractionsforfault-tolerantasynchronousdistributed viii Preface systems”, and “Fault-tolerant agreement in synchronous distributed systems”. Parts of them appearinthisbook,afterhavingbeenrevised,corrected,andimproved. Hence,IwanttothankMorgan&Claypool,andmoreparticularlyDianeCerra,fortheirper- missiontoreusepartsofthiswork. Ialsowanttothankmycolleagues(innoparticularorder)A.Moste´faoui,D.Imbs,S.Rajsbaum, V.Gramoli,C.Delporte,H.Fauconnier,F.Ta¨ıani,M.Perrin,A.Castan˜eda,M.Larrea,andZ.Bouzid, withwhomIcollaboratedintherecentpastyears. IalsothankthePolytechnicUniversityofHong Kong(PolyU),andmoreparticularlyProfessorJiannongCao,forhostingmewhileIwaswritingparts ofthisbook. MythanksalsotoRonanNugent(Springer)forhissupportandhishelpinputtingitall together. Lastbutnotleast(andmaybemostimportantly),Ithankalltheresearcherswhoseresultsarepre- sentedinthisbook. Withouttheirwork,thisbookwouldnotexist. (Finally,sinceItypesettheentire textmyself–LATEX2(cid:2)forthetextandxfig forfigures–anytypesettingortechnicalerrorsthatremain aremyresponsibility.) ProfessorMichelRaynal AcademiaEuropaea InstitutUniversitairedeFrance ProfessorIRISA-ISTIC,Universite´deRennes1,France ChairProfessor,HongKongPolytechnicUniversity June–December2017 Rennes,Saint-Gre´goire,Douelle,Saint-Philibert,HongKong, Vienna(DISC’17),WashingtonD.C.(PODC’17),MexicoCity(UNAM) Contents I IntroductoryChapter 1 1 AFewDefinitionsandTwoIntroductoryExamples 3 1.1 AFewDefinitionsRelatedtoDistributedComputing. . . . . . . . . . . . . . . . . . . 3 1.2 Example1:CommonDecisionDespiteMessageLosses . . . . . . . . . . . . . . . . . 7 1.2.1 TheProblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 TryingtoSolvetheProblem:Attempt1 . . . . . . . . . . . . . . . . . . . . . 9 1.2.3 TryingtoSolvetheProblem:Attempt2 . . . . . . . . . . . . . . . . . . . . . 9 1.2.4 AnImpossibilityResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.5 ACoordinationProblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Example2: ComputingaGlobalFunctionDespiteaMessageAdversary . . . . . . . . . . . . . . . 11 1.3.1 TheProblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2 TheNotionofaMessageAdversary . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.3 TheTREE-ADMessageAdversary . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.4 FromMessageAdversarytoProcessMobility . . . . . . . . . . . . . . . . . . 15 1.4 MainDistributedComputingModelsUsedinThisBook . . . . . . . . . . . . . . . . . 16 1.5 DistributedComputingVersusParallelComputing . . . . . . . . . . . . . . . . . . . . 17 1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.7 BibliographicNotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.8 ExercisesandProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 II TheReliableBroadcastCommunicationAbstraction 21 2 ReliableBroadcastinthePresenceofProcessCrashFailures 23 2.1 UniformReliableBroadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.1 FromBestEfforttoGuaranteedReliability . . . . . . . . . . . . . . . . . . . 23 2.1.2 UniformReliableBroadcast(URB-broadcast). . . . . . . . . . . . . . . . . . 24 2.1.3 BuildingtheURB-broadcastAbstractioninCAMPn,t[∅] . . . . . . . . . . . . 25 2.2 AddingQualityofService . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.1 “FirstIn,FirstOut”(FIFO)MessageDelivery. . . . . . . . . . . . . . . . . . 27 2.2.2 “CausalOrder”(CO)MessageDelivery . . . . . . . . . . . . . . . . . . . . . 29 2.2.3 FromFIFO-broadcasttoCO-broadcast . . . . . . . . . . . . . . . . . . . . . 31 2.2.4 FromURB-broadcasttoCO-broadcast:CapturingCausalPastinaVector . . . 34 2.2.5 TheTotalOrderBroadcastAbstractionRequiresMore . . . . . . . . . . . . . 38 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.4 BibliographicNotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5 ExercisesandProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 ix x Contents 3 ReliableBroadcastinthePresenceofProcessCrashesandUnreliableChannels 41 3.1 ASystemModelwithUnreliableChannels . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1.1 FairnessNotionsforChannels . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1.2 FairChannel(FC)andFairLossyChannel . . . . . . . . . . . . . . . . . . . 42 3.1.3 ReliableChannelinthePresenceofProcessCrashes . . . . . . . . . . . . . . 43 3.1.4 SystemModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 URB-broadcastinCAMPn,t[-FC] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.1 URB-broadcastinCAMPn,t[-FC, t<n/2] . . . . . . . . . . . . . . . . . . 45 3.2.2 AnImpossibilityResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 FailureDetectors:anApproachtoCircumventImpossibilities . . . . . . . . . . . . . . 47 3.3.1 TheConceptofaFailureDetector . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.2 FormalDefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4 URB-broadcastinCAMPn,t[-FC]EnrichedwithaFailureDetector . . . . . . . . . . 49 3.4.1 DefinitionoftheFailureDetectorClassΘ . . . . . . . . . . . . . . . . . . . . 49 3.4.2 SolvingURB-broadcastinCAMPn,t[-FC, Θ] . . . . . . . . . . . . . . . . . 50 3.4.3 BuildingaFailureDetectorΘinCAMPn,t[-FC, t<n/2] . . . . . . . . . . 50 3.4.4 TheFundamentalAddedValueSuppliedbyaFailureDetector . . . . . . . . . 51 3.5 QuiescentUniformReliableBroadcast . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5.1 TheQuiescenceProperty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5.2 QuiescentURB-broadcastBasedonaPerfectFailureDetector . . . . . . . . . 52 3.5.3 TheClassHB ofHeartbeatFailureDetectors . . . . . . . . . . . . . . . . . . 54 3.5.4 QuiescentURB-broadcastinCAMPn,t[-FC, Θ,HB] . . . . . . . . . . . . . 56 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.7 BibliographicNotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.8 ExercisesandProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4 ReliableBroadcastinthePresenceofByzantineProcesses 61 4.1 ByzantineProcessesandPropertiesoftheModelBAMPn,t[t<n/3] . . . . . . . . . 61 4.2 TheNo-DuplicityBroadcastAbstraction . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.2 AnImpossibilityResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.3 ANo-DuplicityBroadcastAlgorithm . . . . . . . . . . . . . . . . . . . . . . 63 4.3 TheByzantineReliableBroadcastAbstraction . . . . . . . . . . . . . . . . . . . . . . 65 4.4 AnOptimalByzantineReliableBroadcastAlgorithm . . . . . . . . . . . . . . . . . . 66 4.4.1 AByzantineReliableBroadcastAlgorithmforBAMPn,t[t<n/3] . . . . . . 66 4.4.2 CorrectnessProof. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.4.3 BenefitingfromMessageAsynchrony . . . . . . . . . . . . . . . . . . . . . . 68 4.5 TimeandMessage-EfficientByzantineReliableBroadcast. . . . . . . . . . . . . . . . 69 4.5.1 AMessage-EfficientByzantineReliableBroadcastAlgorithm . . . . . . . . . 70 4.5.2 CorrectnessProof. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.7 BibliographicNotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.8 ExercisesandProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 III TheRead/WriteRegisterCommunicationAbstraction 75 5 TheRead/WriteRegisterAbstraction 77 5.1 TheRead/WriteRegisterAbstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.1.1 ConcurrentObjectsandRegisters . . . . . . . . . . . . . . . . . . . . . . . . 77
Description: