Time-Efficient Read/Write Register in Crash-prone Asynchronous Message-Passing Systems AchourMostéfaoui†,MichelRaynal⋆,‡ †LINA,UniversitédeNantes,44322Nantes, France ⋆InstitutUniversitairedeFrance ‡IRISA, UniversitédeRennes, 35042Rennes, France 6 1 0 Tech Report #2031,14pages, January 2016 2 IRISA, UniversityofRennes 1,France n a J 9 Abstract 1 Theatomicregisteriscertainlythemostbasicobjectofcomputingscience. Itsimplementation ] ontopofann-processasynchronousmessage-passingsystemhasreceivedalotofattention. Ithas C beenshownthatt<n/2(wheretisthemaximalnumberofprocessesthatmaycrash)isanecessary D andsufficientrequirementtobuildanatomicregisterontopofacrash-proneasynchronousmessage- . s passingsystem. Consideringsuchacontext,thispapervisitsthenotionofafastimplementationof c anatomicregister,andpresentsanewtime-efficientasynchronousalgorithm. Itstime-efficiencyis [ measuredaccordingtotwodifferentunderlyingsynchronyassumptions.Whateverthisassumption, 1 a write operationalwayscostsa round-tripdelay, whilea readoperationcostsalwaysa round-trip v delayinfavorablecircumstances(intuitively,whenitisnotconcurrentwithawrite). Whendesign- 0 ing this algorithm, the design spirit was to be as close as possible to the one of the famous ABD 2 8 algorithm(proposedbyAttiya,Bar-Noy,andDolev). 4 0 Keywords: Asynchronousmessage-passingsystem,Atomicread/writeregister,Concurrency,Fast . 1 operation,Processcrashfailure,Synchronousbehavior,Time-efficientoperation. 0 6 1 : v i X r a 1 1 Introduction SinceSumertime[7],and–muchlater–Turing’smachinetape[13],read/writeobjects arecertainly the mostbasicmemory-basedcommunicationobjects. Suchanobject,usuallycalledaregister,providesits users(processes)withawriteoperationwhichdefinesthenewvalueoftheregister,andareadoperation which returns the value of the register. When considering sequential computing, registers are universal inthesensethattheyallowtosolveanyproblemthatcanbesolved[13]. Registerinmessage-passingsystems Inamessage-passingsystem,thecomputingentitiescommuni- cate only bysending and receiving messages transmitted through acommunication network. Hence, in such asystem, aregister isnotacommunication object givenforfree, but constitutes acommunication abstraction whichmustbebuiltwiththehelpofthecommunication networkandthelocal memoriesof theprocesses. Several types of registers can be defined according to which processes are allowed to read or write it, and the quality (semantics) of the value returned by each read operation. We consider here registers whicharesingle-writer multi-reader (SWMR),andatomic. Atomicitymeansthat(a)eachreadorwrite operation appears as if it had been executed instantaneously at a single point of the time line, between is start event and its end event, (b) no two operations appear at the same point of the time line, and (c) a read returns the value written by the closest preceding write operation (or the initial value of the register if there is no preceding write) [8]. Algorithms building multi-writer multi-reader (MWMR) atomic registers from single-writer single-reader (SWSR) registers with a weaker semantics (safe or regularregisters)aredescribed inseveraltextbooks (e.g.,[3,9,12]). Manydistributedalgorithmshavebeenproposed,whichbuildaregisterontopofamessage-passing system,beitfailure-free orfailure-prone. Inthefailure-prone case,theaddressed failuremodelsarethe process crash failure model, or the Byzantine process failure model (see, the textbooks [3, 9, 10, 11]). The most famous of these algorithms was proposed by H. Attiya, A. Bar-Noy, and D. Dolev in [2]. Thisalgorithm,whichisusuallycalledABDaccordingtothenamesitsauthors, considersann-process asynchronous system inwhichuptot < n/2processes maycrash (itisalso shownin[2]that t < n/2 is an upper bound of the number of process crashes which can be tolerated). This simple and elegant algorithm, relies on (a) quorums [14], and (b) a simple broadcast/reply communication pattern. ABD uses this pattern once in a write operation, and twice in a read operation implementing an SWMR register. Fastoperation Toourknowledge,thenotionofafastimplementationofanatomicregisteroperation, infailure-proneasynchronousmessage-passingsystems,wasintroducedin[5]forprocesscrashfailures, andin[6]forByzantineprocessfailures. Thesepapersconsiderathree-componentmodel,namelythere arethreedifferenttypesofprocesses: asetofwritersW,asetofreadersR,andasetofserversS which implementstheregister. Moreover,aclient(awriterorareader)cancommunicateonlywiththeservers, andtheserversdonotcommunicate amongthemselves. In these papers, fast means that a read or write operation must entail exactly one communication round-trip delaybetweenaclient(thewriterorareader)andtheservers. Whenconsidering theprocess crash failure model (the one we are interested in in this paper), it is shown in [5] that, when (|W| = 1)∧(t ≥ 1)∧(|R| ≥ 2), the condition (|R| < |S| −2) is necessary and sufficient to have fast read t andwriteoperations (asdefinedabove),whichimplementanatomicregister. Itisalsoshownin[5]that thereisnofastimplementation ofanMWMRatomicregisterif(cid:0)(|W| ≥2)∧(|R| ≥ 2)∧(t ≥ 1)(cid:1). Content of the paper The work described in [5, 6] is mainly on the limits of the three-component model (writers, readers, and servers constitute three independent sets of processes) in the presence of process crash failures, or Byzantine process failures. These limits are captured by predicates involving 2 thesetofwriters(W),thesetofreaders (R),thesetofservers (S),andthemaximalnumberofservers that can be faulty (t). Both the underlying model used in this paper and its aim are different from this previous work. Whilekeepingthespirit(basicprinciplesandsimplicity)ofABD,ouraimistodesignatime-efficient implementation of an atomic register in the classical model used in many articles and textbooks (see, e.g., [2, 3, 9, 12]). This model, where any process can communicate with any process, can be seen as a peer-to-peer model in which each process is both a client (it can invoke operations) and a server (it managesalocalcopyoftheregisterthatisbuilt).1 Adopting the usual distributed computing assumption that (a) local processing times are negligible andassumedconsequently tohavezeroduration, and(b)onlycommunication takestime,thispaperfo- cusesonthecommunication timeneededtocompleteareadorwriteoperation. Forthisreasontheterm time-efficiency is defined here in terms on message transfer delays, namely, the cost of a read or write operationismeasuredbythenumberof“consecutive”messagetransferdelaystheyrequiretoterminate. Letusnotice that thisincludes transfer delays duetocausally related messages (for exampleround trip delays generated by request/acknowledgment messages), but also (as wewillsee in the proposed algo- rithm)messagetransferdelayswhichoccursequentially withoutbeingnecessarily causally related. Let usnoticethatthisnotionofatime-efficient operation doesnotinvolve themodelparameter t. In order to give a precise meaning to the notion of a “time-efficient implementation” of a register operation, this paper considers two distinct ways to measure the duration of read and write operations, each based on a specific additional synchrony assumption. One is the “bounded delay” assumption, the other one the “round-based synchrony” assumption. More precisely, these assumptions and the associated time-efficiency oftheproposed algorithm arethefollowing. • Boundeddelayassumption. Let us assume that every message takes at most ∆ time units to be transmitted from its sender to any of its receivers. In such a context, the algorithm presented in the paper has the following time-efficiency properties. – Awriteoperation takesatmost2∆timeunits. – A read operation which is write-latency-free takes at most 2∆ time units. (The notion of write-latency-freedom isdefinedinSection3. Intuitively, itcaptures thefactthatthebehav- iorofthereaddoesnotdepend onaconcurrent orfaulty writeoperation, whichistheusual case in read-dominated applications.) Otherwise, it takes at most 3∆ time units, except in thecasewherethereadoperationisconcurrentwithawriteoperationandthewritercrashes during this write, where it can take up to 4∆ time units. (Let us remark that a process can experience atmostoncethe4∆readoperation scenario.) • Round-based synchrony assumption. Here,theunderlyingcommunicationsystemisassumedtoberound-basedsynchronous[3,8,11]. In such asystem, the processes progress by executing consecutive synchronous rounds. Inevery round, according to its code, a process possibly sends a message to a subset of processes, then receives all the messages sent to it during the current round, and finally executes local computa- tion. Attheendofaround,allprocessesaredirectedtosimultaneouslyprogresstothenextround. In such a synchronous system, everything appears as if all messages take the very same time to go from their sender to theirs receivers, namely the duration δ associated with a round. When executedinsuchacontext, theproposed algorithm hasthefollowingtime-efficiencyproperties. – Theduration ofawriteoperation is2δ timeunits. 1Considering thethree-component model whereeachreader isalsoaserver (i.e., R = S), weobtain atwo-component modelwithonewriterandreader-serverprocesses. Inthismodel,thenecessaryandsufficientcondition(|R|< |S| −2)can t never besatisfied, whichmeansthat, itisimpossibletodesignafastimplementationof aSWMRatomicregisterinsucha two-componentmodel. 3 – The duration of a read operation is 2δ time units, except possibly in the specific scenario where the writer crashes while executing the write operation concurrently with the read, in which case theduration of the read can be 3δ timeunits (as previously, let us remark that a process canexperience atmostoncethe3δ readoperation scenario.) Hence, while it remains correct in the presence of any asynchronous message pattern (e.g., when each message takes one more time unit than any previous message), the proposed algorithm is particu- larlytime-efficientwhen“good”scenariosoccur. Thosearetheonesdefinedbytheprevioussynchrony patternswherethedurationofareadorawriteoperationcorrespondstoasingleround-tripdelay. More- over,intheothersynchronous scenarios,whereareadoperationisconcurrentwithawrite,themaximal duration of the read operation is precisely quantified. A concurrent write adds uncertainty whose reso- lutionbyareadoperationrequiresonemoremessagetransferdelay(twointhecaseofthe∆synchrony assumption, iftheconcurrent writecrashes). Roadmap Thepaperconsistsof6sections. Section2presentsthesystemmodel. Section3definesthe atomic register abstraction, and thenotion ofatime-efficient implementation. Then, Section 4presents anasynchronous algorithm providing animplementation ofanatomic register withtime-efficient oper- ations, aspreviously defined. Section5provesitsproperties. Finally,Section6concludes thepaper. 2 System Model Processes The computing model is composed of a set of n sequential processes denoted p1, ..., pn. Eachprocessisasynchronous whichmeansthatitproceedsatitsownspeed,whichcanbearbitraryand remainsalwaysunknowntotheotherprocesses. A process may halt prematurely (crash failure), but executes correctly its local algorithm until it possibly crashes. Themodel parameter tdenotes themaximal number ofprocesses thatmaycrash ina run. Aprocess thatcrashesinarunissaidtobefaulty. Otherwise,itiscorrectornon-faulty. Communication The processes cooperate by sending and receiving messages through bi-directional channels. The communication network is a complete network, which means that any process p can i directlysendamessagetoanyprocessp (includingitself). Eachchannelisreliable(noloss,corruption, j nor creation of messages), not necessarily first-in/first-out, and asynchronous (while the transit time of eachmessageisfinite,thereisnoupperboundonmessagetransittimes). A process pi invokes the operation “send TAG(m) to pj” to send pj the message tagged TAG and carrying thevaluem. Itreceivesamessagetagged TAG byinvoking theoperation “receive TAG()”. The macro-operation “broadcast TAG(m)” is a shortcut for “for each j ∈ {1,...,n} send TAG(m) to pj end for”. (The sending order is arbitrary, which means that, if the sender crashes while executing this statement, anarbitrary –possibly empty–subsetofprocesses willreceivethemessage.) Letusnoticethat, duetoprocess andmessage asynchrony, noprocess canknowifanotherprocess crashedorisonlyveryslow. Notation In the following, the previous computation model, restricted to the case where t < n/2, is denoted CAMP [t < n/2](CrashAsynchronous Message-Passing). n,t It is important to notice that, in this model, all processes are a priori “equal”. As we will see, this allowseachprocess tobeatthesametimea“client” anda“server”. Inthissense, andasnoticed inthe Introduction, this model is the “fully connected peer-to-peer” model (whose structure is different from othercomputingmodelssuchastheclient/servermodel,whereprocessesarepartitionedintoclientsand servers, playingdifferentroles). 4 3 Atomic Register and Time-efficient Implementation 3.1 Atomicregister A concurrent object is an object that can be accessed by several processes (possibly simultaneously). An SWMR atomic register (say REG) is a concurrent object which provides exactly one process (calledthewriter)withanoperation denotedREG.write(),andallprocesses withanoperation denoted REG.read(). Whenthe writerinvokes REG.write(v) itdefines v asbeing the new value ofREG. An SWMR atomic register (we also say the register is linearizable [4]) is defined by the following set of properties [8]. • Liveness. Aninvocation ofanoperation byacorrect processterminates. • Consistency(safety). Alltheoperationsinvokedbytheprocesses,exceptpossibly–foreachfaulty process– the last operation it invoked, appear as if they have been executed sequentially and this sequence ofoperations issuchthat: – each read returns the value written by the closest write that precedes it (or the initial value ofREG ifthereisnopreceding write), – ifanoperation op1terminated before anoperation op2started, thenop1appears beforeop2 inthesequence. Thisset ofproperties states that, from anexternal observer point ofview, theobject appears asifit wasaccessed sequentially bythe processes, this sequence (a) respecting thereal timeaccess order, and (ii)belonging tothesequential specification ofaread/writeregister. 3.2 Notionofatime-efficient operation Thenotionofatime-efficientoperationisnotrelatedtoitscorrectness,butisapropertyofitsimplemen- tation. Itissometimescallednon-functional property. Inthepresentcase,itcapturesthetimeefficiency ofoperations.2 As indicated in the introduction, we consider here two synchrony assumptions to define what we mean by time-efficient operation implementation. As we have seen, both are based on the duration of readandwriteoperations, intermsofmessagetransferdelays. Letusrememberthat,inbothcases,itis assumedthatthelocalprocessing timesneeded toimplementthesehighlevelreadandwriteoperations arenegligible. 3.2.1 Boundeddelay-based definitionofatime-efficientimplementation Letus assume an underlying communication system where message transfer delays are upper bounded by∆. Write-latency-freereadoperationandinterferingwrite Intuitively,areadoperationiswrite-latency- free if its execution does “not interleave” with the execution of a write operation. More precisely, let τ be the starting time of a read operation. This read operation is write-latency-free if (a) it is not con- r current with a write operation, and (b) the closest preceding write did not crash and started at a time τ < τ −∆. w r Letopr beareadoperation, whichstarted attimeτ . Letopw betheclosest writepreceding opr. If r opwstartedattimeτ ≥ τ −∆,itissaidtobeinterfering withopr. w r 2Anotherexampleofanon-functionalpropertyisquiescence. Thispropertyisonalgorithmsimplementingreliablecom- municationontopofunreliablenetworks[1]. Itstatesthatthenumberofunderlyingimplementationmessagesgeneratedby anapplicationmessagemustbefinite. Hence,ifthereisatimeafterwhichnoapplicationprocesssendsmessages,thereisa timeafterwhichthesystemisquiescent. 5 Bounded delay-based definition An implementation of a read/write register is time-efficient (from a bounded delaypointofview)ifitsatisfiesthefollowingproperties. • Awriteoperation takesatmost2∆timeunits. • Areadoperation whichiswrite-latency-free takesatmost2∆timeunits. • Areadoperation whichisnotwrite-latency-free takesatmost – 3∆timeunitsifthewriterdoesnotcrashwhileexecutingtheinterfering write, – 4∆ time units if the writer crashes while executing the interfering write (this scenario can appearatmostonceforeachprocess). 3.2.2 Roundsynchrony-based definitionofatime-efficientimplementation Letusassumethattheunderlyingcommunicationsystemisround-based synchronous, whereeachmes- sagetransferdelayisequaltoδ. Whenconsideringthisunderlyingsynchronyassumption,itisassumed thataprocess sendsorbroadcasts atmostonemessage perround, andthisisdoneatthebeginning ofa round. An implementation of a read/write register is time-efficient (from the round-based synchrony point ofview)ifitsatisfiesthefollowingproperties. • Theduration ofawriteoperation is2δ timeunits. • The duration of a read operation is 2δ time units, except possibly in the “at most once” scenario wherethewritercrasheswhileexecuting thewriteoperation concurrently withtheread,inwhich casetheduration ofthereadcanbe3δ timeunits. What does the proposed algorithm As we will see, the proposed algorithm, designed for the asyn- chronoussystemmodelCAMP [t < n/2],providesanSWMRatomicregisterimplementationwhich n,t istime-efficient forboth its “bounded delay”-based definition, and its“round synchrony”-based defini- tion. 4 An Algorithm with Time-efficient Operations The design of the algorithm, described in Figure 1, is voluntarily formulated to be as close as possible toABD.ForthereaderawareofABD,thiswillhelpitsunderstanding. Localvariables Eachprocessp managesthefollowinglocalvariables. i • reg containsthevalueoftheconstructedregisterREG,ascurrentlyknownbyp . Itisinitialized i i totheinitialvalueofREG (e.g.,thedefaultvalue⊥). • wsn isthesequence numberassociated withthevalueinreg . i i • rsn isthesequence numberofthelastreadoperation invoked byp . i i • swsn is a synchronization local variable. It contains the sequence number of the most recent i value of REG that, to p ’s knowledge, is known by at least (n − t) processes. This variable i (whichisnewwithrespecttootheralgorithms)isattheheartofthetime-efficientimplementation ofthereadoperation. • res isthevalueofREG whosesequence numberisswsn . i i 6 localvariablesinitialization:regi←⊥;wsni ←0;swsni←0;rsni ←0. operationwrite(v)is (1) wsni ←wsni+1;regi←v;broadcastWRITE(wsni,v); (2) wait(cid:0)WRITE(wsni,−)receivedfrom(n−t)differentprocesses(cid:1); (3) return() endoperation. operationread()is%thewritermaydirectlyreturnreg % i (4) rsni ←rsni+1;broadcastREAD(rsni); (5) wait(cid:0)(messagesSTATE(rsn,−)receivedfrom(n−t)differentprocesses)∧(swsni≥maxwsn) wheremaxwsnisthegreatestsequencenumberinthepreviousSTATE(rsn,−)messages(cid:1); (6) return(resi) endoperation. %———————————————————————————————————————– whenWRITE(wsn,v)isreceiveddo (7) if(wsn>wsni)thenregi←v;wsni ←wsnendif; (8) if(notyetdone)thenbroadcastWRITE(wsn,v)endif; (9) if(cid:0)WRITE(wsn,−)receivedfrom(n−t)differentprocesses(cid:1) (10) thenif(wsn>swsni)∧(notalreadydone)thenswsni ←wsn;resi←vendif (11) endif. whenREAD(rsn)isreceivedfrompj do (12) sendSTATE(rsn,wsni)topj. Figure1: Time-efficientSWMRatomicregisterinAMP [t < n/2] n,t Client side: operation write() invoked by the writer When the writer p invokes REG.write(v), it i increases wsni,updates regi,andbroadcasts themessage WRITE(wsni,v)(line1). Then,itwaitsuntil it has received an acknowledgment message from (n − t) processes (line 2). When this occurs, the operationterminates(line3). Letusnoticethattheacknowledgmentmessageisacopyoftheverysame messageastheoneitbroadcast. Server side: reception ofamessage write(wsn,v) whenaprocess p receives such amessage, and i this message carries a more recent value than the one currently stored in reg , p updates accordingly i i wsn andreg (line7). Moreover,ifthismessageisthefirstmessagecarryingthesequencenumberwsn, i i pi forwardstoalltheprocesses themessage WRITE(wsn,v) ithasreceived (line8). Thisbroadcast has two aims: to be an acknowledgment for the writer, and to inform the other processes that p “knows” i thisvalue.3 Moreover, whenpi hasreceived themessage WRITE(wsn,v) from (n−t)different processes, and swsn issmallerthanwsn,itupdates itslocalsynchronization variable swsn andaccordingly assigns i i v tores (lines9-11). i Server side: reception ofamessage READ(rsn) Whenaprocess pi receives such amessage from a processpj,itsendsbyreturntopj themessageSTATE(rsn,wsni),therebyinformingitonthefreshness of the last value of REG it knows (line 12). The parameter rsn allows the sender p to associate the j messages STATE(rsn,−) itwillreceivewiththecorresponding requestidentified byrsn. Clientside: operation read()invoked byaprocess p When aprocess invokes REG.read(), itfirst i broadcaststhemessageREAD(rsni)withanewsequencenumber. Then,itwaitsuntil“some”predicate 3Letusobservethat,duetoasynchrony,itispossiblethatwsni > wsnwhenpireceivesamessageWRITE(wsn,v)for thefirsttime. 7 issatisfied (line5),andfinallyreturns thecurrent value ofres . Letusnotice thatthevalueres thatis i i returned istheonewhosesequence numberisswsn . i The waiting predicate is the heart of the algorithm. Its first part states that p must have received a i messageSTATE(rsn,−)from(n−t)processes. Itssecondpart,namely(swsni ≥ maxwsn),statesthat thevalueinp ’slocalvariableres isasrecentormorerecentthanthevalueassociatedwiththegreatest i i writesequence numberwsnreceived bypi inamessage STATE(rsn,−). Combinedwiththebroadcast of messages WRITE(wsn,−) issued by each process at line 8, this waiting predicate ensures both the correctness ofthereturnedvalue(atomicity), andthefactthatthereadimplementation istime-efficient. 5 Proof of the Algorithm 5.1 Termination andatomicity Theproperties proved inthis section are independent ofthe message transfer delays (provided they are finite). Lemma1 Ifthe writer is correct, all its write invocations terminate. If areader is correct, all its read invocations terminate. Proof Letusfirstconsider thewriterprocess. Asbyassumption itiscorrect, itbroadcasts themessage WRITE(sn,−) (line 1). Each correct process broadcasts WRITE(sn,−) when it receives it for the first time(line8). Asthereareatleast(n−t)correctprocesses,thewritereventuallyreceivesWRITE(sn,−) fromtheseprocesses, andstopswaitingatline2. Let us now consider a correct reader process p . It follows from the same reasoning as before that i thereaderreceivesthemessage STATE(rsn,−) fromatleast(n−t)processes (lines5and12). Hence, it remains to prove that the second part of the waiting predicate, namely swsn ≥ maxwsn (line 5) i becomes eventually true, where maxwsn is the greatest write sequence number received by p in a i message STATE(rsn,−). Letpj be the sender of this message. The following list of items is such that itemx =⇒ item(x+1),fromwhichfollowsthatswsn ≥ maxwsn(line5)iseventuallysatisfied. i 1. pj updated wsnj tomaxwsn(line7)beforesending STATE(rsn,maxwsn) (line12). 2. Hence,pj receivedpreviously themessage WRITE(maxwsn,−), andbroadcast itthefirsttimeit receivedit(line8). 3. It follows that any correct process receives the message WRITE(maxwsn,−) (at least from pj), andbroadcasts itthefirsttimeitreceivesit(line8). 4. Consequently, pi eventually receives the message WRITE(maxwsn,−) from (n−t) processes. When this occurs, it updates swsn (line 10), which is then ≥ maxwsn, which concludes the i proofofthetermination ofareadoperation. ✷ Lemma1 Lemma2 TheregisterREG isatomic. Proof Let read[i,x] be a read operation issued by a process p which returns the value with sequence i number x, and write[y] be the write operation which writes the value with sequence number y. The proofofthelemmaistheconsequence ofthethreefollowingclaims. • Claim1. Ifread[i,x]terminates beforewrite[y]starts,thenx < y. • Claim2. Ifwrite[x]terminatesbeforeread[i,y]starts,thenx ≤ y. • Claim3. Ifread[i,x]terminates beforeread[j,y]starts,thenx ≤ y. 8 Claim 1 states that no process can read from the future. Claim 2 states that no process can read over- writtenvalues. Claim3statesthatthereisnonew/oldreadinversions [3,11]. ProofofClaim1. This claim follows from the following simple observation. When the writer executes write[y], it first increases its local variable wsn which becomes greater than any sequence number associated with its previous writeoperations (line 1). Hence ifread[i,x]terminates before write[y]starts, wenecessarily havex < y. ProofofClaim2. Itfollowsfrom line 2and lines7-8that, whenwrite[x]terminates, there isasetQ ofatleast(n−t) w processes p such that wsn ≥ x. On another side, due to lines 4-5 and line 12, read[i,y] obtains a k k message STATE() fromasetQr ofatleast(n−t)processes. As |Q | ≥ n − t, |Q | ≥ n − t, and n > 2t, it follows that Q ∩ Q is not empty. There is w r w r consequently a process p ∈ Q ∩ Q , such that that wsn ≥ x. Hence, p sent to p the message k w r k k i STATE(−,z), wherez ≥ x. Due to (a) the definition of maxwsn ≥ z, (b) the predicate swsn ≥ maxwsn ≥ z (line 5), and i (c)the value ofswsn = y,itfollows that y = swsn ≥ z when read[i,y] stops waiting atline 5. As, i i z ≥x,itfollowsy ≥ x,whichprovestheclaim. ProofofClaim3. When read[i,x] stops waiting at line 5, it returns the value res associated with the sequence number i swsni = x. Processpi previously receivedthemessage WRITE(x,−) fromasetQr1 ofatleast(n−t) processes. Thesame occurs for pj, which, before returning, received the message WRITE(y,−) from a setQr2 ofatleast(n−t)processes. As|Qr1|≥ n−t,|Qr2| ≥ n−t,andn > 2t,itfollowsthatQr1∩Qr2isnotempty. Hence,thereis aprocess pk which sent STATE(,x) to pi,and later sent STATE(−,y) topj. Asswsnk never decreases, itfollowsthatx ≤ y,whichcompletestheproofofthelemma. ✷ Lemma2 Theorem1 Algorithm 1implementsanSWMRatomicregister inCAMP [t < n/2]. n,t Proof TheprooffollowsfromLemma1(termination) andLemma2(atomicity). ✷ Theorem1 5.2 Time-efficiency: the boundeddelayassumption Asalready indicated, thisunderlying synchrony assumption considers thateverymessagetakes atmost ∆time units. Moreover, let us remind that aread (which started at timeτ )iswrite-latency-free ifitis r notconcurrent withawrite,andthelastpreceding writedidnotcrashandstartedattimeτ < τ −∆. w r Lemma3 Awriteoperation takesatmost2∆timeunits. Proof The case of the writer is trivial. The message WRITE() broadcast by the writer takes at most ∆ timeunits, asdotheacknowledgment messagesWRITE() sentbyeachprocess tothewriter. Inthiscase 2∆ correspond toacausality-related maximal round-trip delay (the reception of amessage triggers the sending ofanassociated acknowledgment). ✷ Lemma3 9 Whenthewriterdoesnotcrashwhileexecutingawriteoperation Thecaseswherethewriterdoes notcrashwhileexecuting awriteoperation arecaptured bythenexttwolemmas. Lemma4 Awrite-latency-free readoperationtakesatmost2∆timeunits. Proof Let p be a process that issues a write-latency-free read operation, and τ be its starting time. i r Moreover, Let τ the starting time of the last preceding write. As the read is write latency-free, we w have τ + ∆ < τ . Moreover, as messages take at most ∆ time units, and the writer did not crash w r whenexecuting thewrite,each non-crashed process pk received themessage WRITE(x,−) (sentbythe preceding write at time τ + ∆ < τ ), broadcast it (line 8), and updated its local variables such that w r wehave wsnk = x(lines 7-11)atimeτw +∆ < τr. Hence, all themessages STATE() received by the readerp carrythewritesequence numberx. Moreover, duetothebroadcast ofline8executedbyeach i correct process, wehave swsn = x at some time τ +2∆ < τ +∆. It follows that the predicate of i w r line5issatisfiedatp within2∆timeunitsafteritinvoked thereadoperation. ✷ i Lemma4 Lemma5 Areadoperationwhichisnotwrite-latency-free, andduringwhichthewriterdoesnotcrash duringtheinterfering writeoperation, takesatmost3∆. Proof Letusconsiderareadoperationthatstartsattimeτ ,concurrentwithawriteoperationthatstarts r attimeτ andduringwhichthewriterdoesnotcrash. Fromthereadoperation pointofview,theworst w caseoccurswhenthereadoperationisinvokedjustaftertimeτ −∆,letussayattimeτ = τ −∆+ǫ. w r w As a message STATE(rsn,−) is sent by return when a message READ(rsn) is received, the messages STATE(rsn,−) received by pi by time τr + 2∆ can be such that some carry the sequence number x (duetolastpreviouswrite)whileotherscarrythesequencenumberx+1(duetotheconcurrentwrite)4. Hence, maxwsn = x or maxwsn = x + 1 (predicate of line 5). If maxwsn = x, we also have swsn = x and p terminates its read. Ifmaxwsn = x+1, p mustwait until swsn = x+1, which i i i i occursatthelatestatτw+2∆(whenpi receivesthelastmessageofthe(n−t)messages WRITE(y,−) which makes true the predicates of lines 9-10, thereby allowing the predicate of line 5 to be satisfied). Whenthisoccurs,p terminatesitsreadoperation. Asτ = τ +∆−ǫ,p returnsatthelatestτ +3∆−ǫ i w r i r timeunitsafteritinvoked thereadoperation. ✷ Lemma4 Whenthewritercrasheswhileexecutingawriteoperation Theproblem raisedbythecrashofthe writerwhileexecutingthewriteoperationiswhenitcrasheswhilebroadcastingthemessageWRITE(x,−) (line1): someprocesses receivethismessage by∆timeunits, whileotherprocesses donot. Thisissue is solved by the propagation of the message WRITE(x,−) by the non-crashed processes that receive it (line8). Thismeansthat,intheworstcase(asinsynchronous systems),themessageWRITE(x,−)must be forwarded by (t+1) processes before being received by all correct processes. This worst scenario mayentailacostof(t+1)∆timeunits. Figure2presentsasimplemodificationofAlgorithm 1,whichallowsafastimplementation ofread operations whoseexecutionsareconcurrentwithawriteoperationduringwhichthewritercrashes. The modifications areunderlined. When a process pi receives a message READ(), it now returns a message STATE() containing an additional field,namelythecurrentvalueofreg ,itslocalcopyofREG (line12). i Whenaprocess pi receives fromaprocess pj amessage STATE(−,wsn,v), itusesitinthewaiting predicateofline5,butexecutesbeforethelines7-11,asifthismessagewasWRITE(wsn,v). According to the values of the predicates of lines 7, 9, and 10, this allows p to expedite the update of its local i variables wsn ,reg ,swsn ,andres ,therebyfavoring fasttermination. i i i i 4 Messages STATE(rsn,x) are sent by the processes that received READ(rsn) before τw, while the messages STATE(rsn,x+1)aresentbytheprocessesthatreceivedREAD(rsn)betweenτwandτr+∆=τw+ǫ. 10