ACIA, not ACID: Conditions, Properties and Challenges YuqingZhu⋆, JianxunLiu⋆, MengyingGuo⋆, WenlongMa⋆, GuoleiYi†, YungangBao⋆ ⋆InstituteofComputingTechnology,ChineseAcademyof Sciences †Baidu 7 1 0 2 n a Abstract bility)transactionsandsupportingonlyBASE(Basically J Availability, Soft state and Eventual consistency) data 9 AlthoughACID is the previousgoldenrule fortransac- access [62]. As the CAP Theorem[21] states that only 2 tion support, durability is now not a basic requirement two can be guaranteed among consistency, availability for data storage. Rather, high availability is becoming ] andpartitiontolerance,thedevelopersofBASEsystems C the first-class property required by online applications. trade consistency for availability, guaranteeingeventual D We show that high availability of data is almost surely consistency instead of strong consistency and transac- s. a stronger property than durability. We thus propose tions [70, 71, 32]. In fact, the CAP Theorem does not c ACIA (Atomicity, Consistency, Isolation, Availability) indicate that transactions must be relinquished for high [ asthenewstandardfortransactionsupport. Essentially, availability,whichisthenclarified[8,22,63,14].Efforts the shift from ACID to ACIA is due to the change of 2 are thus devoted to supporting transactions with high v assumed conditions for data management. Four major availabilityinrecentyears[30,74,60,57,58,73,54,29, 2 conditionchangesexist. With ACIA transactions,more 15]. A concept of highly-available transactions is even 1 diverserequirementscanbeflexiblysupportedforappli- proposed[13,11]. Transactionsnowmustbesupported 5 cations through the specification of consistency levels, 7 withhighavailabilityforonlineapplications. isolationlevelsandfaulttolerancelevels. Clarifyingthe 0 Inthispaper,weproposeACIAtobethenewstandard . ACIA properties enables the exploitation of techniques 1 for transaction support, instead of ACID, replacing used for ACID transactions, as well as bringing about 0 Durability with Availability. As demonstrated by the newchallengesforresearch. 7 yearsofpracticewiththeBASEmodel,applicationscan 1 workwellwithsystemsanddatainsoftstate[72,62,2, : 1 Introduction v 32, 43], even on the datacenter-scale power outage [3]. Xi Transactionsupporthasbeenrecognizedagainasindis- Softstateonlyrequiresthatcorrectdataorstatescanbe pensableforonlineapplicationsinrecentyears[67].Not regeneratedatthe expenseofadditionalcomputationor r a implementingtransactionsinhighlyavailabledatastores fileI/Oonfaultssuchasnetworkpartition[35].Withsoft is even considered one’s biggest mistake [7]. In recent state guaranteeing availability, durability is no longer a years, systems like Megastore [15] and Spanner [29] fundamentalpropertyrequiredby datamanagement. In emerge; and, academic solutions for transactional sup- fact, as long as data is available, applications do not port are also proposed, e.g., MDCC [42], replicated care aboutwhetherthe correctdata is durablystored or commit [54], and TAPIR [73]. These emergent pro- regeneratedontheflyinthesystem. posals guarantee the ACID properties of transactions We show that high availability of data is in fact a in distributed replicated datastores. More importantly, stronger property than durability (Section 2). Highly they simultaneously consider the guarantee of high available data can be made durable, while durable data availabilitythroughdatareplication. is not necessarily highly available. Hence, ACIA tran- High availability is now the de facto first-class prop- sactions cannot be supported by all proposals designed erty required by online applications. Without high forguaranteeingACIDtransactionsoverreplicateddata- availability, even the temporary inaccessibility of data stores.ThefundamentalreasonisthatACIAtransactions or service can lead to great economic loss [4, 5, 6]. assume new conditions commonly made for online High availability is once guaranteed by abandoning applications in system implementations (Section 3). ACID (Atomicity, Consistency, Isolation and Dura- These conditions are different from those generally assumed for the classic data managementsystems [19]. their storage across multiple geographiclocations. It is Assuming the classic conditions, e.g., predictable com- highlyunlikelythatpowerfailureswillaffectstoragein munication delay, some recent proposals [58] cannot all locations. That is, power failures will almost surely support ACIA transactions. Even some proposals can not affect highly available systems, unless the power support ACIA transactions, they are not necessarily failureisglobal(acasewithprobabilityzero). designedwithoutredundantcomponents,e.g.,persistent logs [42, 73]. We present a specification of ACIA properties, which constitute the major highlights of a newtransactionparadigm(Section4). Tocheckwhether the transaction is supportedby a particular system, one onlyneedstomakeanACIAtestofthesystem’squality. ClarifyingtheACIApropertiesenablesimplementations Figure 1: The typical architecture of a durable storage to combine different mechanisms that guarantee the engine. properties respectively. Besides, it also enables us to explore the new space of research challenges and Ontheotherhand,durablesystemsarenotnecessarily problems(Section5). accessible at any time. To guarantee data durability, log-based crash recovery is extensively studied for 2 HighAvailabilityvs. Durability ACID transactional database research [40, 55]. The High availability is a property specifying that a typical architecture of such durable and recoverable database, service or system is continuously accessible databasesystemsisdemonstratedin Figure1. Thefour to users[35]. Using with transactions, high availability basic building blocks for such systems include durable referstoapropertyofdatabaseinthispaper.Wesaythat data image, durable logs, buffer manager and recovery adatabaseishighavailableifanyclientconnectedtothe manager. The recovery manager relies on the durable system can access any data within the database at any logs to recover a correct database on errors, while time. Notethat,thisdescriptiondoesnotconcernabout the buffer manager manages durable data and logs for implementations. Incomparison,thedurabilityproperty programaccess. Theunderlyingassumptionforsuchan of ACID requires that a committed transaction’s effect architecture is that the recovery allows a system down be reflected by the database in the persistent storage, time. In comparison, highly available systems handles suchthattheeffectofthecommittedtransactionwillnot failuresorerrorswithoutsystemdowntime. belostevenonpowerfailures[39]. Asapropertyofthedatabase,highavailabilityshares afewsimilaritieswithdurability. First,highavailability is a property independentof the atomicity, consistency, and isolation properties. Each of the four properties mustbeguaranteedbytherespectivemeasures. Second, highavailabilityplacesnoconstraintsondataorreplica Figure 2: The typical architecture of a highly-available consistency [37, 68]. A client accessing the highly storageengine. available database can find the data outdated or up-to- date; and, how inconsistent the returned data can be With high availability, a database will not become depends on the consistency level agreed between the inaccessible because of partial system failures such as client and the database. Third, high availability has no node failure or network partition. Replication is the indications for isolation levels. ACID transactions can typicalmechanismtoguaranteehighavailabilityofdata. beimplementedwithdifferentisolationlevels[16],while Exploiting fault-tolerant replication algorithms, partial differentisolationlevels can resultin differentdatabase system failures can be tolerated without bringing the states kept durable. Similarly for highly available system down. The number of replica failures a system database,theisolationlevelssupportedbythesystemcan can tolerate depends on the algorithms used and the alsoaffectthedatabaseimageobservedbyaclient. consistency level specified. For example, eventually High availability of data is almost surely a stronger consistent systems can guarantee high availability as property than durability. On the one hand, highly long as all replicas do not fail simultaneously [71]. A available databases can be made durable easily. Given fault-tolerant system can be made highly available by high availability of data, we can use a client program fixing failed components timely. The architecture of a to traverse the whole database and store all returned typical storage engine guaranteeing high availability is values to a persistent storage, leading to durability. demonstratedin Figure2. Thedata managerallowsthe Nowadays, highly available systems usually distribute datatoresideinmemoryorinpersistentstorage. 2 3 New Implementation Conditions one run of a Consensus algorithm can reach a single agreement on the command for the RSM (denoted as Conditions previously assumed for distributed system single-decreeinarelatedwork[59]).Therefore,multiple implementations become invalid now for large-scale runs of the algorithm are needed to make the system distributed systems that are widely deployedto support functional. the plethora of online applications. One example is Anodecannotrecoverandrejointhesystemduringa whether a system can be inaccessible for some time. runoftheConsensusalgorithm.Duringarun,Consensus Previously, systems can have a down time for failure algorithms generally assume the static membership of recovery (MTTR, Mean Time To Recovery); now, the nodes (called participants, acceptors or learners) [46]. system must remain accessible for the high availability Nodes can leave the membership due to failures but requirement. Anotherexampleis whethera system can not join the membership. Rather, the recovery and have synchronized clocks across servers. Previously, the change of membership must be handled before or systemserversareassumedtohavesynchronizedclocks, after a run of the Consensus algorithm. Otherwise, enabling the timestamp-based distributed concurrency extramechanismscalledreconfiguration[48,49]mustbe control(CC)[17];now,theclocksondifferentclockscan added to the implementation. In fact, a recoverednode becoordinatedatmosttoacertainprecision,requiringa doesnotneedtojointhesystemduringanalgorithmrun differentCCdesign[29]. unlessthenumberoffailednodesexceedthealgorithm’s We capturethechangedconditionsforsystemimple- faulttolerancelevel. mentationsbyobservinghowConsensusalgorithmsare exploited in replicated systems to guarantee high avail- Unpredictable Message Delay. The delay for a ability[23,20,64,24].ThemostwidelyusedConsensus message to reach its receiver is not predictable. This algorithm is Paxos [45, 24]. Years of practice have condition exists because large-scale systems supporting proven that Paxos is feasible in the practical scenarios online applications can be globally distributed and the oflarge-scaledistributedsystems. ThePaxosalgorithm system networkin wide areais highlyunpredictable. It assumestheasynchronoussystemmodel[31]andcrash- is possible that a message sent by one server travels in stop failure [61]. The crash-stop failure is related to thesystemnetworkforanindefinitelylongtime,making the conditions of implementation compliance and node the message look like a lost message. Note that this recoverymoment,whiletheasynchronoussystemmodel condition also models the situation that messages can is related to the phenomena of unpredictable message get lost for some undetermined reasons, e.g., network delays, inconsistentclocks and unreliablefailure detec- congestion. Implementations relying on predictable tion. The conditions and the phenomena must all be communicationdelays [58] are not suitable for the real consideredinsystemimplementations. applicationswiththiscondition. Implementation Compliance. A server node in Inconsistent Clocks. Different servers in the same the system must behaveas the implementationdictates. system can hardly have consistent clocks. The reason Besides,itmustfollowtheprotocolimplementedbythe is two fold. First, the local clocks of servers can systemtosendandreceivemessages. Anodecaneither drift independently as time passes. Second, the com- behaveaccordingtotheimplementationorcrashtostay munication delay is neither constant nor predictable. inastopstate.Theabovephenomenonhappenswhenthe Although techniques exist to synchronize the clocks crash-stop failure is assumed. Other failure modesalso to a certain precision [29], designs relying on precise exist. A commonly studied failure is when the server timestamps [17] will generally not work in highly can send arbitrary messages to other servers, without availablesystems.Totracktheglobalorderingofevents, complyingto the implementation. This failure mode is mechanisms such as vector clocks are needed [44, 32, called the Byzantine failure [50]. At present, the most 52]. commonlyassumedfailureisthecrash-stopfailure. Unreliable Failure Detection. A server cannot Node Recovery Moment. In large-scale distributed precisely distinguish message slowness from failures systems, the failures of individual server node are of other servers. Since a message can travel in the consideredcommonsituations, whichmustbe tolerated system for an indefinite long time, a server has no by the system software. Replication is the common way to differentiate whether its message is slow or the techniqueforfaulttoleranceandhighavailability.While interacting server has failed. Furthermore, if a server themodelofreplicatedstatemachines(RSM)[44]offers is not receiving messages from any other server, it can ameasuretodescribeandanalyzereplication,Consensus hardly tell whether all other servers fail, whether a algorithms coordinate the RSM to process commands network partition happens, or whether all messages are properly even under node failures. The decision of travellingslowlysuddenly. Noneofthethreeconditions eachcommandfortheRSM ismodeledasaConsensus can be made certain of. That is, reliable failure problem.DesignedtosolvetheConsensusproblem[33], detectionisnotpossible,whichthenrulesoutallpossible 3 • Isolation. Operations within a transaction must Table1:ImplementationConditions:ACIDvs. ACIA. be isolated from other transactions running con- Conditions ACID ACIA currently. How much transactions can be isolated from other transactions is defined as the isolation System Withdowntime Withoutdowntime levels [16, 10], which are guaranteed by different recovery concurrencycontrolmechanisms[66,18,69]. Message Predictable Unpredictable • Availability. Onceatransactionhascommittedits delay results,thesystemmustguaranteethattheseresults Synchronized are reflected in the database, whose data can be Possible Impossible clocks accessedbyanyclientconnectedtothesystem. Onfailure TheClients’View. PreviouslywithACID,theclient Stopandrestart Faulttolerance detection understandsthattransactionsareatomicanddurableand that the consistency and the isolation conditionscan be solutions [34]. To solve the problem in practice, a specified for differentapplications. Typicalconsistency serverwillresenditsmessagesandtakecommunication levelagreementsareintegrityconstraintslikecorrelated timeouts as the signal for server failures [27, 29, 32, 1, updates for foreign keys [19]. Typical isolation level 28]. Therefore, the corresponding condition for failure agreementsare like serializability, snapshotisolation or detectionis thatserverfailurescan be detectedin some readcommitted[16]. waysuchastimeouts. WithACIA,thedatabaseisguaranteedtobecontinu- Conditions for ACIA vs. ACID. We summarize ouslyaccessibleandtransactionsareatomic.Thevarious the changes of typical implementation conditions from requirements of the client can be flexibly supported ACID to ACIA in Table 1. Four main changes exist. throughthespecificationofconsistencylevels,isolation First, classic ACIDdesignsassume the systemrecovers levels and fault tolerance levels. Different from ACID, withdowntime,whileACIArequiringhighavailability the consistencylevelsfor ACIA need to includereplica assumes no system down time. Second, classic ACID consistencyspecifications. Besides,newisolationlevels designs assume predictable message delays to enable are possible and to be added [75]. The levels of fault timestamp-based mechanisms, while ACIA assumes tolerance can also be specified. The level of fault unpredictable message delays. Third, classic ACID tolerance can be traded off with performance. For designs are for system with limited distribution and example,alowleveloffaulttoleranceenablesthesystem synchronized server clocks, while ACIA are for large- to use fast Paxos for low latency, while a high level scale systems that are impossible to have synchronized canresultinthesystemusingtheclassicPaxosbutwith server clocks. Fourth, classic ACID systems stop and higherlatency[45,47]. restart on failure detection and recovery, while ACIA Implementation. Clarifying the ACIA properties systems assume built-in fault-tolerance mechanisms. enables us to exploit early research results. Similar to These changes and differences are in general. There the implementation of ACID transactions, the support exist early systems assuming some of the conditionsof of ACIA transactions require the implementation of ACIA systems, but not all conditions are considered. consistency compliance, concurrency control and com- NowwithACIArequirements,alltheconditionsmustbe mit coordination, as well as replica control. In the consideredandhandledinthesystemimplementations. past, the former three aspects are discussed separately with different measures. For example, consistency 4 Specification ofACIAProperties compliance can be about how foreign keys can be updated efficiently; concurrency control can be about GiventheconditionsofSection3,thefourpropertiesof which isolation levels are best for an application and ACIAarespecifiedasfollows. whichschemeismostefficientforaworkload;and,com- • Atomicity. The effects of either all or none of mit coordination is about how atomicity is guaranteed the operations in a transaction are reflected in the throughdifferentprotocolslike2PCor1PC.Wecanthus database, with the user knowing which of the two exploitand combine such mechanismsto guarantee the resultsis. ACIAproperties. • Consistency. Eachsuccessfultransactioncancom- Although the four properties of ACIA must be gua- ranteed simultaneously for transactions, the guarantee mitonlylegalresults,preservingtheconsistencyof of the first three properties are independent. As each the database. Legal results comply with rules and of the three properties are guaranteed through different consistency levels specified for the database, e.g., schemes, numerous combinations of the schemes are integrity constraints [37] and replica consistency possible. Furthermore, the consistency and the isola- levels[68]. 4 tion can be guaranteed with multiple choices, various consistencylevelsareallowedbyapplications? and,(3) combinations of these choices are possible, leading how can the new consistency levels be specified in an to flexible application usages. Current solutions to application-understandablemanner? transaction support in highly-available systems usually Application-Understandable Isolation Levels. interwind the mechanisms that guarantee these proper- Classic concurrency control mechanisms study how ties. Forexample,highly-availabletransactions[13,12] operations of different transactions can be ordered mix replica consistency and availability with isolation correctly. For ACIA transactions, concurrency control levels, leaving a vague image blurring replica control must also consider how each operations can ordered and concurrency control. Clarifying the properties of on different replicas, leading to a more complex global ACIAenablesfurtherexplorationinsystemdesignsand view of operation executions. The phenomena-based implementations. specification [16, 12, 10] of classic isolation levels cannot cover all isolation levels possible for ACIA 5 Challenges transactions, e.g., partition-based isolation levels [75]. We now discuss the challenges in supporting ACIA The spectra of isolation levels possible for ACIA transactions. These challenges arise mainly due to the transactions need to be studied, and a corresponding changesofimplementationconditions. specification is in need. Besides, the specification Atomic Commit Problem Redefined. The classic must be made application-understandable, as isolation problem of atomic commit assumes each participant levels are mainly for applications to choose the correct serves orthogonal data units for a distributed transac- implementations. tion [19]. Many widely used protocols are designed Failure Detection Mechanisms & Fault-Tolerance to solve this classic atomic commit problem [55, 9, Levels. Failuredetectionisthebasisforfaulttolerance. 36]. For ACIA transactions, multiple participants can SystemssupportingACIA transactionsmusthave built- server the same data units. These participants can fail infaulttolerancemechanismsandthusfailuredetection independently, but not all participantsserving the same mechanisms. Failuredetectionswithdifferentreliability data unit fail simultaneously. Obviously, the classic require different solutions [65, 26, 25], which can problem definition of atomic commit is not suit for tolerate varied numbers of failures (denoted as fault- ACIAtransactions. Anewproblemdefinitionofatomic tolerance levels). Current implementations for failure commitisneeded. detections are mainly based on timeouts [51]. Systems Furthermore, typical solutions to the classic atomic implementing failure detectors with different reliability problem relies on persistent logs [29, 55, 9]. With remaintobedevised.And,aspecificationonwhatfault- high availability of data, keeping durable logs is not tolerance levels supported by such systems needs to be a wise choice. Therefore, new mechanisms not using provided. persistentlogsneedtobedevisedtosolvethenewatomic 6 Conclusion commitproblemofACIA.Infact, astheclassic atomic commitsolutionsisavote-after-decideprocess[56],we This paper makes a key observation that the high cantakethevote-before-decidealternativeandtransform availability requirement of data has changed the con- the new atomic commit problem into the Consensus ditions for transaction support. In this paper, we problem [76], making a solution based on existent have shown that high availability is almost surely a algorithmspossible. stronger property than durability. On proposing ACIA Comprehensive Consistency Levels. The consis- instead of ACID, we have investigated the conditions tencyofACIDisenforcedthroughthecompliancecheck for system implementations supporting ACIA transac- of all defined rules, including constraints, cascades, tions. The change of implementation conditions leads triggers, and any combination thereof. These rules are to new challenges for transaction support in large- basedonasingle-replicadatabaseimage.Incomparison, scale distributed systems. We analyze the challenges ACIAdatabasesareinherentlydistributedandreplicated. regardingeach propertyof ACIA. The implementation- Therefore, the rules of constraints, cascades, triggers, independent specification of ACIA properties not only and their combinations must be extended to consider enables the reuse of mechanisms previously devised to replicas. Furthermore, there exist multiple consistency support ACID transactions, but also opens up a new levels for replicated data, and these consistency levels researchspaceforfutureexploration. are allowed by different applications [68, 41, 53, 38]. Acknowledgments This work is supported in part Thefollowingquestionsthusremaintobeanswered:(1) by the State Key Development Program for Basic how can these replica consistency levels be combined Researchof China(GrantNo. 2014CB340402)andthe with classic consistency rules to offer comprehensive National Natural Science Foundation of China (Grant consistency levels? (2) which of the comprehensive No. 61303054). 5 References [18] BERNSTEIN,P.A.,ANDGOODMAN,N.Concurrencycontrolin distributeddatabasesystems. ACMComputingSurveys(CSUR) [1] Cassandrawebsite. http://cassandra.apache.org/. 13,2(1981),185–221. [2] Hbasewebsite. http://hbase.apache.org/. [19] BERNSTEIN, P. A., HADZILACOS, V., AND GOODMAN, N. [3] Lessons netflix learned from the aws outage. Concurrencycontrolandrecoveryindatabasesystems,vol.370. http://techblog.netflix.com/2011/04/lessons-netflix-learned- Addison-wesleyNewYork,1987. from-aws-outage.html. [20] BOLOSKY, W. J., BRADSHAW, D., HAAGENS, R. B., [4] Lightning causes amazon, microsoft cloud outages KUSTERS,N. P., AND LI, P. Paxosreplicated statemachines in europe, August 2011. http://www.crn.com/news/ asthebasisofahigh-performancedatastore. InProceedingsof cloud/231300384/lightning-causes-amazon-microsoft-cloud- NSDI’11(2011),USENIXAssociation,pp.11–11. outages-in-europe.htm. [21] BREWER,E.A. Towardsrobustdistributedsystems(abstract). [5] Reddit, quora, foursquare, hootsuite go down due InProceedingsofPODC’00(2000),ACM,pp.7–. to amazon ec2 cloud service troubles, April 2011. [22] BREWER,E.A. Pushingthecap:strategiesforconsistencyand http://www.huffingtonpost.com/2011/04/21/amazon-ec2-takes- availability. Computer45,2(2012),23–29. down-reddit-quora-foursquare-hootsuite n 851964.html. [23] BURROWS, M. The chubby lock service for loosely-coupled [6] Amazon cloud goes down friday night, taking netflix, distributed systems. In Proceedings of the 7th symposium on instagram and pinterest with it, October 2012. Operatingsystemsdesignandimplementation(2006),USENIX http://www.forbes.com/sites/anthonykosner/2012/06/30/ Association,pp.335–350. amazon-cloud-goes-down-friday-night-taking-netflix-instagram- and-pinterest-with-it/. [24] CHANDRA, T., GRIESEMER, R., AND REDSTONE, J. Paxos made live-an engineering perspective (2006 invited talk). In [7] Jeff dean on large-scale deep learning at google, 2016. ProceedingsofPODC’07(2007),vol.7. http://highscalability.com/blog/2016/3/16/jeff-dean-on-large- scale-deep-learning-at-google.html. [25] CHANDRA, T. D., HADZILACOS, V., AND TOUEG, S. The weakest failure detector for solving consensus. Journal of the [8] ABADI,D.Consistencytradeoffsinmoderndistributeddatabase ACM(JACM)43,4(1996),685–722. system design: Cap is onlypart ofthe story. Computer 45, 2 (2012),37–42. [26] CHANDRA,T.D.,ANDTOUEG,S. Unreliablefailuredetectors forreliabledistributedsystems. JournaloftheACM(JACM)43, [9] ABDALLAH,M.,GUERRAOUI,R.,ANDPUCHERAL,P. One- 2(1996),225–267. phasecommit: doesitmakesense? InParallelandDistributed Systems, 1998. Proceedings. 1998International Conference on [27] CHANG, F., DEAN, J., GHEMAWAT, S., HSIEH, W. C., (1998),IEEE,pp.182–192. WALLACH,D.A.,BURROWS,M.,CHANDRA,T.,FIKES,A., ANDGRUBER,R.E. Bigtable: Adistributedstoragesystemfor [10] ADYA,A.,LISKOV,B.,ANDO’NEIL,P. Generalizedisolation structureddata. ACMTransactionsonComputerSystems26,2 leveldefinitions. InDataEngineering,2000.Proceedings.16th (2008),4. InternationalConferenceon(2000),IEEE,pp.67–78. [28] COOPER, B. F., RAMAKRISHNAN, R., SRIVASTAVA, U., [11] BAILIS, P., DAVIDSON, A., FEKETE, A., GHODSI, A., SILBERSTEIN,A., BOHANNON, P., JACOBSEN, H.-A., PUZ, HELLERSTEIN, J. M., AND STOICA, I. Highly available N., WEAVER,D., ANDYERNENI,R. Pnuts: Yahoo!’s hosted transactions: Virtuesandlimitations. ProceedingsoftheVLDB dataservingplatform.ProceedingsofVLDB’081,2,1277–1288. Endowment7,3(2013),181–192. [29] CORBETT,J.C.,DEAN,J.,EPSTEIN,M.,FIKES,A.,FROST, [12] BAILIS, P., FEKETE, A., FRANKLIN, M. J., GHODSI, A., HELLERSTEIN, J. M., AND STOICA, I. Feral concurrency C., FURMAN, J., GHEMAWAT, S., GUBAREV, A., HEISER, C., HOCHSCHILD, P., ET AL. Spanner: Google’s globally- control: An empirical investigation of modern application distributeddatabase. ProceedingsofOSDI(2012),1. integrity. In Proceedings of the 2015 ACM SIGMOD InternationalConferenceonManagementofData(2015),ACM, [30] COWLING, J. A., AND LISKOV, B. Granola: Low-overhead pp.1327–1342. distributed transaction coordination. In USENIX Annual TechnicalConference(2012),vol.12. [13] BAILIS,P.,FEKETE,A.,GHODSI,A., HELLERSTEIN,J.M., AND STOICA, I. Hat, not cap: Towards highly available [31] CRISTIAN,F. Synchronousandasynchronous. Commun.ACM transactions. InProceedingsofHotOS’13(Berkeley,CA,USA, 39,4(Apr.1996),88–97. 2013),USENIXAssociation,pp.24–24. [32] DECANDIA,G.,HASTORUN,D.,JAMPANI,M.,KAKULAPATI, [14] BAILIS, P., AND GHODSI, A. Eventual consistency today: G., LAKSHMAN, A., PILCHIN, A., SIVASUBRAMANIAN, S., limitations,extensions,andbeyond. Commun.ACM56,5(May VOSSHALL,P.,AND VOGELS,W. Dynamo: amazon’shighly 2013),55–63. available key-value store. In Proceedings of SOSP’07, vol. 7, pp.205–220. [15] BAKER,J.,BOND,C.,CORBETT,J.,FURMAN,J.,KHORLIN, A., LARSON, J., LE´ON, J.-M., LI, Y., LLOYD, A., AND [33] FISCHER,M.J.Theconsensusprobleminunreliabledistributed YUSHPRAKH, V. Megastore: Providing scalable, highly systems (a brief survey). In International Conference on available storage for interactive services. In Proc. of CIDR FundamentalsofComputationTheory(1983),Springer,pp.127– (2011),pp.223–234. 140. [16] BERENSON, H., BERNSTEIN, P., GRAY, J., MELTON, J., [34] FISCHER, M. J., LYNCH, N. A., AND PATERSON, M. S. O’NEIL, E., AND O’NEIL,P. Acritique ofansisqlisolation Impossibility of distributed consensus with one faulty process. levels. ACMSIGMODRecord24,2(1995),1–10. JournaloftheACM(JACM)32,2(1985),374–382. [17] BERNSTEIN, P. A., AND GOODMAN, N. Timestamp-based [35] FOX, A., GRIBBLE,S. D., CHAWATHE,Y., BREWER, E.A., algorithms for concurrency control in distributed database ANDGAUTHIER,P. Cluster-basedscalablenetworkservices. In systems. In Proceedings of the sixth international conference ACMSIGOPSoperatingsystemsreview(1997), vol.31,ACM, onVeryLargeDataBases-Volume6(1980),VLDBEndowment, pp.78–91. pp.285–300. 6 [36] GRAY,J.,ANDLAMPORT,L.Consensusontransactioncommit. [55] MOHAN, C., HADERLE, D., LINDSAY, B., PIRAHESH, H., ACMTrans.DatabaseSyst.31,1(Mar.2006),133–160. AND SCHWARZ, P. Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using [37] GRAY,J.,ANDREUTER,A. Transactionprocessing: concepts write-ahead logging. ACM Transactions on Database Systems andtechniques. Elsevier,1992. (TODS)17,1(1992),94–162. [38] GUERRAOUI, R., PAVLOVIC, M., AND SEREDINSCHI, D.- A. Incremental consistency guarantees for replicated objects. [56] MOHAN,C.,LINDSAY,B.,ANDOBERMARCK,R. Transaction managementinther*distributeddatabasemanagementsystem. In 12th USENIX Symposium on Operating Systems Design ACMTrans.DatabaseSyst.11,4(Dec.1986),378–396. and Implementation (OSDI 16) (2016), USENIX Association, pp.169–184. [57] NAWAB, F., AGRAWAL, D., AND ABBADI, A. E. Message futures: Fast commitment of transactions in multi-datacenter [39] HAERDER, T., AND REUTER, A. Principles of transaction- environments. InProc.ofCIDR(2013). oriented database recovery. ACM Comput. Surv. 15, 4 (Dec. 1983),287–317. [58] NAWAB,F.,ARORA,V.,AGRAWAL,D.,ANDELABBADI,A. Minimizingcommitlatencyoftransactionsingeo-replicateddata [40] KOHLER,W.H.Asurveyoftechniquesforsynchronizationand stores. InProceedingsofSIGMOD’15(2015),ACM,pp.1279– recovery in decentralized computer systems. ACMComputing 1294. Surveys(CSUR)13,2(1981),149–183. [41] KRASKA, T., HENTSCHEL, M., ALONSO, G., AND KOSS- [59] ONGARO, D., AND OUSTERHOUT, J. In search of an understandable consensus algorithm. In2014USENIXAnnual MANN, D. Consistencyrationing inthecloud: Payonlywhen TechnicalConference(USENIXATC14)(2014),pp.305–319. it matters. Proceedings of the VLDBEndowment 2, 1(2009), 253–264. [60] PATTERSON,S.,ELMORE,A.J.,NAWAB,F.,AGRAWAL,D., [42] KRASKA,T.,PANG,G.,FRANKLIN,M.J.,ANDMADDEN,S. AND EL ABBADI, A. Serializability, not serial: Concurrency controlandavailabilityinmulti-datacenterdatastores. Proceed- Mdcc:Multi-datacenterconsistency. InEurosys(2013). ingsoftheVLDBEndowment5,11(2012),1459–1470. [43] LAKSHMAN,A., ANDMALIK,P. Cassandra: Adecentralized structuredstoragesystem. SIGOPSOper.Syst.Rev.44,2(Apr. [61] POLEDNA, S. Fault-tolerant real-time systems: The problem ofreplicadeterminism, vol.345. SpringerScience&Business 2010),35–40. Media,2007. [44] LAMPORT, L. Time, clocks, and the ordering of events in a distributed system. Communications oftheACM21, 7(1978), [62] PRITCHETT,D. Base: Anacidalternative. Queue6,3(2008), 48–55. 558–565. [45] LAMPORT,L. Thepart-timeparliament. ACMTransactionson [63] RAMAKRISHNAN,R. Capandclouddatamanagement. IEEE Computer45,2(2012),43–49. ComputerSystems(TOCS)16,2(1998),133–169. [46] LAMPORT, L. Paxos made simple. ACM Sigact News 32, 4 [64] RAO,J.,SHEKITA,E.J.,ANDTATA,S. Usingpaxostobuilda scalable,consistent,andhighlyavailabledatastore. Proc.VLDB (2001),18–25. Endow.4,4(Jan.2011),243–254. [47] LAMPORT,L. Fastpaxos. DistributedComputing19,2(2006), 79–103. [65] REYNAL, M. A short introduction to failure detectors for asynchronous distributed systems. ACM SIGACT News 36, 1 [48] LAMPORT, L., MALKHI, D., AND ZHOU, L. Vertical paxos (2005),53–70. andprimary-backupreplication. InProceedingsofthe28thACM SymposiumonPrinciplesofDistributedComputing(NewYork, [66] STONEBRAKER, M. Concurrency control and consistency of multiplecopiesofdataindistributedingres. IEEETransactions NY,USA,2009),PODC’09,ACM,pp.312–313. onSoftwareEngineering,3(1979),188–194. [49] LAMPORT,L.,MALKHI,D., ANDZHOU,L. Reconfiguringa statemachine. SIGACTNews41,1(Mar.2010),63–73. [67] STONEBRAKER, M. Stonebraker on nosql and enterprises. Commun.ACM54,8(2011),10–11. [50] LAMPORT, L., SHOSTAK, R., AND PEASE, M. The byzantinegeneralsproblem.ACMTransactionsonProgramming [68] TERRY, D. B., PRABHAKARAN, V., KOTLA, R., BALAKR- LanguagesandSystems(TOPLAS)4,3(1982),382–401. ISHNAN, M., AGUILERA, M. K., AND ABU-LIBDEH, H. Consistency-basedservicelevelagreementsforcloudstorage. In [51] LENERS, J. B., WU, H., HUNG, W.-L.,AGUILERA, M. K., ProceedingsoftheTwenty-FourthACMSymposiumonOperating AND WALFISH, M. Detecting failures in distributed systems SystemsPrinciples(2013),ACM,pp.309–324. withthefalconspynetwork. InProceedingsoftheTwenty-Third ACMSymposiumonOperatingSystemsPrinciples(2011),ACM, [69] THOMASIAN,A. Concurrencycontrol: Methods,performance, andanalysis. ACMComput.Surv.30,1(Mar.1998),70–119. pp.279–294. [52] LLOYD, W., FREEDMAN, M. J., KAMINSKY, M., AND [70] VOGELS,W.Dataaccesspatternsintheamazon.comtechnology platform.InProceedingsofthe33rdInternationalConferenceon ANDERSEN, D. G. Don’t settle for eventual: Stronger VeryLargeDataBases(2007),VLDB’07,VLDBEndowment, consistencyforwide-areastoragewithcops. InProceedingsof pp.1–1. SOSP2011(2011),ACM. [53] LU, H., VEERARAGHAVAN, K., AJOUX, P., HUNT, J., [71] VOGELS,W. Eventuallyconsistent. Commununications ofthe ACM52,1(2009),40–44. SONG, Y. J., TOBAGUS, W., KUMAR, S., AND LLOYD, W. Existential consistency: measuring and understanding [72] XIE, C., SU, C., KAPRITSOS, M., WANG, Y., YAGH- consistencyatfacebook. InProceedingsofthe25thSymposium MAZADEH, N., ALVISI, L., AND MAHAJAN, P. Salt: onOperatingSystemsPrinciples(2015),ACM,pp.295–310. Combining acid and base in a distributed database. In OSDI (2014),vol.14,pp.495–509. [54] MAHMOUD, H. A., PUCHER, A., NAWAB, F., AGRAWAL, D., AND ABBADI, A. E. Low latency multi-datacenter [73] ZHANG, I., SHARMA, N. K., SZEKERES, A., KRISHNA- databasesusingreplicatedcommits.InProceedingsoftheVLDB MURTHY, A., AND PORTS, D. R. Building consistent Endowment(2013). transactions with inconsistent replication. In Proceedings of SOSP’15(NewYork,NY,USA,2015),ACM. 7 [74] ZHANG,Y.,POWER,R.,ZHOU,S.,SOVRAN,Y.,AGUILERA, 2015IEEEInternationalConferenceon(2015),IEEE,pp.531– M.K.,ANDLI,J. Transactionchains:Achievingserializability 538. with low latency in geo-distributed storage systems. In [76] ZHU, Y., YU, P. S., BAO, Y., YI, G., MA, W., GUO, M., ProceedingsofSOSP’13(2013),ACM,pp.276–291. AND LIU, J. To vote before decide: A logless one-phase [75] ZHU, Y., AND WANG, Y. Improving transaction processing commitprotocolforhighly-available datastores. arXivpreprint performance byconsensus reduction. InBigData(BigData), arXiv:1701.02408(2017). 8