Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service Gabriel Antoniu, Jean-François Deverge, Sébastien Monnet To cite this version: Gabriel Antoniu, Jean-François Deverge, Sébastien Monnet. Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service. [Research Report] RR-5309, INRIA. 2004, pp.15. inria-00070691 HAL Id: inria-00070691 https://hal.inria.fr/inria-00070691 Submitted on 19 May 2006 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. INSTITUTNATIONALDERECHERCHEENINFORMATIQUEETENAUTOMATIQUE Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service GabrielAntoniu andJean-FrançoisDeverge andSébastienMonnet N˚5309 Septembre2004 Systèmescommunicants (cid:13) G N E apport + R F 9-- de recherche(cid:13) 0 (cid:13) 3 5 R-- R A/ RI N N I R S 9 I 9 3 6 9- 4 2 0 N S S I Building Fault-Tolerant Consistency Protocols for an Adaptive Grid Data-Sharing Service GabrielAntoniu (cid:0) andJean-FrançoisDeverge (cid:1) andSébastienMonnet (cid:2) Systèmescommunicants ProjetPARIS Rapportderecherche n˚5309 —Septembre2004—15pages Abstract: We address the challenge of sharing large amounts of numerical data within comput- inggridsconsistingofclustersfederation. Wefocusontheproblemofhandlingtheconsistencyof replicateddatainanenvironmentwheretheavailabilityofstorageresourcesdynamicallychanges. Weproposeasoftwarearchitecturewhichdecouplesconsistencymanagementfromfault-tolerance management. Weillustratethisarchitecturewithacasestudyshowinghowtodesignaconsistency protocolusingfault-tolerantbuildingblocks. Asaproofofconcept,wedescribeaprototypeimple- mentationofthisprotocolwithinJUXMEM,asoftwareexperimentalplatformforgriddatasharing, andwereportonapreliminaryexperimentalevaluation. Key-words: Consistencyprotocols,fault-tolerance,grid,data-sharing (Résumé:tsvp) (cid:3) [email protected] (cid:4) [email protected] (cid:5) [email protected] UnitéderechercheINRIARennes IRISA,CampusuniversitairedeBeaulieu,35042RENNESCedex(France) Téléphone: 0299847100-International: +33299847100 Télécopie: 0299847171-International: +33299847171 Construction de protocoles de cohérence tolérants aux fautes pour un service adaptable de partage de données Résumé : Nous nous intéressons au partage de grandes quantités de données numériques dans des grilles de calcul composées de fédérations de grappes. Nous nous concentrons sur le prob- lèmedelagestiondelacohérencededonnéesrépliquéesdansunenvironnementoùladisponibilité desressourcesdestockagechangedynamiquement. Nousproposonsunearchitecturelogiciellequi découplelagestiondelacohérencedelagestiondelatoléranceauxfautes. Nousillustronscettear- chitectureavecuneétudedecasmontrantcommentconcevoirunprotocoledecohérenceenutilisant desmodulesrésistantsauxdéfaillances.Afindevaliderceconcept,nousdécrivonsl’implémentation d’unprototypedeceprotocoledansJUXMEM,uneplate-formelogicielleexpérimentaledepartage dedonnéespourlagrille,etnousprésentonsuneévaluationexpérimentalepréliminaire. Mots-clé: Protocolesdecohérence,toléranceauxfautes,grille,partagededonnées BuildingFault-TolerantConsistencyProtocols foranAdaptiveGridData-SharingService 3 1. Introduction Data management in grid environments is currently a topic of major interest to the grid comput- ing community. However, as of today, no sophisticated approach has been widely established for efficientdatasharingongridinfrastructures.Currently,themostwidely-usedapproachtodataman- agementfordistributedgridcomputationreliesonexplicitdatatransfersbetweenclientsandcom- putingservers.Asanexample,theGlobus[12]platformprovidesdataaccessmechanismsbasedon the GridFTP protocol [1]. Though this protocol providesauthentication, parallel transfers, check- point/restartmechanisms,etc.,itisstillatransferprotocolwhichrequiresexplicitdatalocalization. On top of GridFTP, Globus integrates data catalogs [1], where multiple copies of the same data canbemanuallyregistered. Consistencyissues arehoweverattheuser’scharge. IBP[6], provide a large-scale data storage system, consisting of a set of buffers distributed overInternet. Transfer managementisstillattheuser’schargeandnoconsistencymechanismsareprovidedfor theman- agement of multiple copies of the same data. Finally, Stork [17] proposes an integrated approach allowingtheusertoscheduledataplacementjustlikecomputationaljobs. Again,datalocationand transferareattheuser’scharge. Withinthecontextofagrowingnumberofapplicationsusinglargeamountsofdistributeddata, weclaimthatexplicitmanagementofdatalocationsbytheprogrammerarisesasamajorlimitation against the efficient use of modern, large-scale computational grids. Such a low-level approach makes grid programming extremely hard to manage. In contrast, we have proposed the concept ofdatasharingserviceforgridcomputing[2],whichaimsatprovidingtransparentaccesstodata. ThisapproachisillustratedbytheJUXMEMsoftwareexperimentalplatform.Theuseronlyaccesses data via a global identifier. The service handles data localization and transfer without any help from the programmer. However, it is able to use additional hints provided by the programmer, if any. The service also transparently uses adequate replication strategies and consistency protocols to ensure data availability and consistency. These mechanisms target a large-scale, dynamic grid architecture.Inparticular,theservicesupportseventssuchasstorageresourcesjoiningandleaving, or unexpectedlyfailing. This is the frameworkwithin which we conducted the studypresented in thispaper. Handlingconsistencyofreplicateddata. Thegoalofadata-sharingserviceistoallowgridap- plications to access data in a distributed environment. We are considering scientific applications, typically exhibiting a code-coupling scheme: e.g. multiple weakly-coupled codes running on dif- ferent sites and cooperating via periodical data exchanges. In such applications, shared data are mutable:theycanberead,butalsoupdatedbythedifferentcodes.Whenaccessedonmultiplesites, dataareoftenreplicatedtoenhanceaccesslocality. Replicationisequallyusedforfaulttolerance, sincegridnodesmaycrash. Toensurethatreadoperationsdonotreturnobsoletedata,consistency guarantees have to be provided by the data service. These guarantees are defined via consistency modelsandareimplementedusingconsistencyprotocols. RR n˚5309 4 Antoniu,Deverge,Monnet Difficulty: handlingconsistencyinadynamiccontext. Theproblemofsharingmutabledatain distributed environments has intensively been studied during the past 15 years within the context of Distributed Shared Memory (DSM) systems [18,20]. These systems provide transparent data sharing,viaauniqueaddressspaceaccessibletophysicallydistributedmachines. Whenthenodes modify the data, some consistency action is triggered (e.g., invalidation or update), according to some consistency protocol. A large variety of DSM consistency models and protocols [7,13,15, 20,23]havebeendefined,theirrolebeingtospecifywhichremotenodeshavetobenotifiedofthe modification, and when. They provide various trade-offs between the strength of the consistency guaranteesandtheefficiencyoftheimplementation. However, traditional DSM systems have generally demonstrated satisfactory efficiency (i.e., near-linearspeedups)onlyonsmall-scaleconfigurations: inpractice,uptoafewtensofnodes[20]. Thisisoftenduetotheintrinsiclackofscalabilityofthealgorithmsusedtohandledataconsistency. Mostofthetime,theyrelyonglobalinvalidationsorglobalupdatesofallexistingdatacopies. On the other hand, an overwhelmingmajority of protocols assume a static configurationwhere nodes donotdisconnectnorfail. Itisclearthattheseassumptionsdonotholdanymoreinthecontextofa large-scale,dynamicgridinfrastructure.Faultsarenolongerexceptions,buttheybecomepartofthe generalrule;resourcesmaybecomeunavailableandeventuallybecomeavailableagain;finally,new resources can dynamicallyjoin the infrastructure. In such a context, consistencyprotocols cannot rely any more on entities supposed to be stable, as traditionally was the case. A new approach to theirdesignisdefinitelynecessary,tointegratethesenewhypotheses. Thisideaisatthecoreofthedesignofourgriddata-sharingservice[2],whichwehavedefined asa hybridsystem inspired byDSMsystems(fortransparentaccessto dataandconsistencyman- agement)andP2Psystems(fortheirscalabilityandvolatility-tolerance). Inthispaper,wepropose an approach allowingconsistency protocols to take into account fault toleranceby decouplingthe managementofthesetwoaspects. ThemotivationsandthegeneralprinciplesarepresentedinSec- tion2. InSection3wedescribethedetailedarchitectureandweshowhowtousetraditionalgroup communicationcomponentsoffault-tolerantdistributedsystems[11,19]asbuildingblocksforcon- sistencyprotocols.WeillustratetheapproachinSection4,withacasestudyexplainingthedesignof afault-tolerantconsistencyprotocol.Section5showshowthisprotocolhasbeenimplementedinthe JUXMEM platformandpresentsa preliminaryexperimentalevaluation. Someconcludingremarks andfuturedirectionsaregiveninSection6. 2. Approach: decoupling fault tolerance management from con- sistency management Let us first note that both fault tolerance mechanisms and consistency protocols are traditionally implementedusing replication. However,the underlying motivationsare totally differentfor each ofthetwouses. INRIA BuildingFault-TolerantConsistencyProtocols foranAdaptiveGridData-SharingService 5 Replicationinconsistencyprotocols. Consistencyprotocolsusedatareplicationforperformance issues,toallowmultiplenodestoreadthesamedatainparallelvialocalaccesses. However,when a node modifies a data copy, the consistency protocol is activated, e.g. the other copies must be updated or invalidated, to prevent subsequent read operations from returning invalid data. Note that P2P systems also use replication to enhance access locality, but most of them do not address consistencyissues,sincedataisgenerallyimmutable. Replication for fault tolerance. Replication is also commonly used by fault-tolerance mecha- nisms [21] to enhance availability in an environment with failures. When a node hosting a data copycrashes,othercopiescanbemadeavailablebyothernodes. Variousreplicationstrategieshave been studied [14], leading to various trade-offsbetween efficiency and the level of fault-tolerance guaranteed. Indistributedsystemswherebothconsistencyandfault-toleranceneedtobehandled,replication canbeusedwithadoublegoal.Consequently,dependingonwhetherthesetwoissuesareaddressed separatelyornot,twoarchitecturaldesignsarepossible. Integrateddesign. A possible approach consists in addressing consistency and fault tolerance at the same time, relying on the same set of data replicas. For instance, data copies created by the consistency protocols to enhance data locality can serve as backup if crashes occur. Conversely,backupreplicascreatedforfaulttolerancecanbeusedbytheconsistencyprotocol. Thisapproachhasamajordisadvantage:thedesignofthecorrespondingsoftwarelayerisvery complex,asillustratedbysomefault-tolerantDSMsystems[16,22]. Decoupleddesign. A different approach consists in designing the consistency protocol and the fault-tolerance mechanism separately. This approach has several features. First, the design of consistencyprotocolsis simplified, since the protocolsdo nothave to address fault toler- ance issues at a low level. Therefore, it is possible to leverage existing consistency proto- cols. Onlysomelimitedinteractionbetweentheconsistencyprotocolandthefault-tolerance mechanism needs to be defined (see Section 3.2). Second, consistency protocols and fault- tolerancestrategiescanbedevelopedindependently.Thisfavorsacleanerdesign,eachofthe twocomponentsbeingdedicatedtoitsspecificrole. Finally,thisapproachprovidestheabil- ity to experimentmultiple possibilities to couple variousconsistency protocols with various fault-tolerancestrategies. Thegoalofthispaperisto discusshowtomanageconsistencyandfaulttoleranceatthesame time,inadecoupledway,usingthissecondapproach. RR n˚5309 6 Antoniu,Deverge,Monnet 3. Using fault-tolerant components as building blocks for consis- tency protocols TraditionalconsistencyprotocolsforDSMsystemsrelyonstableentitiesinordertoguaranteethat dataaccessesarecorrectlysatisfied.Forinstance,alargenumberofprotocolsassociatetoeachdata anodeholdingthemostrecentdatacopy.Thisistruefortheveryfirstprotocolsforsequentialcon- sistency[18],butalsoforrecenthome-basedprotocolsimplementinglazyreleaseconsistency[23] orscopeconsistency[15],whereahomenodeisinchargeofmaintainingareferencedatacopy. It isimportanttonotethattheseprotocolsimplicitlyassumethatthehomenodeneverfails. Suchanassumptioncannotbemadeinadynamicgridenvironment,wherefaultsmayoccur. In suchacontext,theroleofhomenodehastobeplayedbyanentityabletotransparentlyreacttofaults anddisconnections,inordertomaintainagivendegreeofavailabilityforthereferencedatacopy.We proposetodesignsuchentitiesusingsomebasicbuildingblocksthathavebeendefinedwithinthe contextoffault-tolerantdistributedsystems[11,19]:groupmembershipprotocols,atomicmulticast, consensus, etc. We briefly introduce these blocks in Section 3.1. Then, Section 3.2 describes the “gluelayers”throughwhichtheconsistencyprotocolinteractswiththesefault-tolerantblocks. 3.1.Basicfault-tolerantcomponents We considerthattwotypesoffaultsneedtobeaddressedinagrid environment. First, nodesmay crash,i.e. nodesactnormally(receiveandsendmessagesaccordingtotheirspecification)untilthey fail (crash failures). Second, we assume messages can be delayed or lost (omission failures). We considertwomaintimingaspects: thecommunicationdelaysandthecomputationtimes. Wemake theassumptionthatupperboundsuponthesetimesexistbutarenotknown.Classicalfault-tolerance mechanismsareoftenbuiltonthesehypotheses,whicharerealisticinagridcontext. The group membership abstraction [11] is such a mechanism providing the ability to manage a set of nodes running the same service. In our case, it applies to a group of nodes that together play the role of home node. Requests sent to the group need to be deliveredin the same order to allgroupmembers. Thispropertyiscommonlycalledatomicmulticast. Asmembersofthegroup haveto agreeuponanorderformessage delivery,atomicmulticastcanbebuilt usinga consensus protocol.Theconsensusprobleminasynchronoussystemscanbesolvedthankstounreliablefailure detectors[10]. Theroleofthesedetectorsistoprovidetohigherlayersalistofnodessuspectedto befaulty.Theconsensusprotocolcancopewiththeapproximateaccuracyofthelistcontents. These blockscan interact with eachother in manyways. Inthis paper,we considera layered, decoupleddesign(Figure1),inspiredby[19].Here,theadaptermoduleallowshigher-levelsoftware layerstoregistertothefailuredetectionserviceandtofilterthelistofsuspectednodesaccordingto someuser-specifiedqualityofservice,asin[8]. INRIA BuildingFault-TolerantConsistencyProtocols foranAdaptiveGridData-SharingService 7 get suspect list join/leave/getview/send/receive Group communication and group membership multicast/receive er apt Atomic Multicast d A propose/decide Consensus get suspect list send/receive Unreliable Failure Unreliable Detector Communications Figure1: Anarchitectureforgroupcommunicationandgroupmembershipprotocols. 3.2.Adecoupledarchitecture: the bigpicture Ourideaistousetheabstractionsdescribedabovetobuildfault-tolerantentitiesabletoplaytherole ofcriticalentitiesinconsistencyprotocols.Forinstance,eachhomenodecanbereplacedbyagroup ofnodeshandledviaagroupmembershipinterfaceandsupportingatomicmulticast.However,some actionslike1)groupself-organizationor2)configurationofnewgroupmembersneedtobehandled byhigher-levellayers. Suchactionsarenotnecessarilyspecific toconsistencyprotocols(i.e. they canapplytoseveralconsistencyprotocols). Theyaresituatedpreciselyatthe“boundary”between fault-tolerancemanagementandconsistencymanagement.Hencetheneedtointroducetwointerface layersinourarchitecture,asshowninFigure2. The Proactive Group Membership layer handles the composition of a group of nodes that to- getheractasahomenode. Thelayerdecideswhentoremovefromthegroupnodesreportedtobe faulty, by parameterizing the QoS of the failure detector. It also removesnodes that notify about their future disconnections. Following such removals, the layer adds new members to the group, to maintain the availabilityofthe home node. To doso, ittakesinto account constraintsspecified at allocation time: the necessary memory size, the network performance, or the replication policy (expressedintermsofnumberofclusterswheretospreaddatareplicas,numberofreplicasperclus- ter, etc.). Various trade-offscould be expressed at this level (e.g. smaller group sizes to enhance communicationefficiencyvs. largergroupsizestoincreasetheleveloffaulttolerance). When some new node is added to the group that acts as a home node, the newcomer has to initialize his state in order to be consistent with the state of the other members of the group. The DynamicConsistencyProtocolConfigurationlayerdefineshowtoinstantiateaconsistencyprotocol RR n˚5309
Description: