Table Of ContentPeer-to-Peer Distributed Shared Memory?
Gabriel Antoniu, Luc Bougé, Mathieu Jan
To cite this version:
Gabriel Antoniu, Luc Bougé, Mathieu Jan. Peer-to-Peer Distributed Shared Memory?. [Research
Report] RR-4924, INRIA. 2003. inria-00071655
HAL Id: inria-00071655
https://hal.inria.fr/inria-00071655
Submitted on 23 May 2006
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est
archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
INSTITUTNATIONALDERECHERCHEENINFORMATIQUEETENAUTOMATIQUE
Peer-to-Peer Distributed Shared Memory?
GabrielAntoniu,LucBougé,MathieuJan
N˚4924
Septembre2003
THÈME1
(cid:13)
G
N
E apport
+
R
F
4-- de recherche(cid:13)
2 (cid:13)
9
4
R--
R
A/
RI
N
N I
R
S
9 I
9
3
6
9-
4
2
0
N
S
S
I
Peer-to-Peer Distributed Shared Memory?
GabrielAntoniu,LucBougé,MathieuJan
Thème1—Réseauxetsystèmes
ProjetParis
Rapportderecherche n˚4924 —Septembre2003—14pages
Abstract: Inthispaper, weshowthat,althoughP2PsystemsandDSMsystemshavebeen
designed in rather different contexts, both can serve as major sources of inspiration for
thedesignofahybridsystem,withintermediatehypothesesandproperties. Weproposethe
conceptofdatasharingserviceforgridcomputing,asacompromisebetweenDSMsystems
andP2Psystems. Themaincontribution ofsuchaserviceistodecoupledatamanagement
from grid computation, by providing location transparency as well as data persistence in
a dynamic environment. To illustrate the proposed concept and validate its feasibility, we
haveimplementedasoftwareprototype: theJUXMEM platform.
Key-words: distributed sharedmemory,peer-to-peer, datasharing,grid,JUXMEM
(Résumé:tsvp)
UnitéderechercheINRIARennes
IRISA,CampusuniversitairedeBeaulieu,35042RENNESCedex(France)
Téléphone: 0299847100-International: +33299847100
Télécopie: 0299847171-International: +33299847171
Mémoire virtuellement partagée pair-à-pair
Résumé: Danscepapier,nousmontronsquemêmesilessystèmespair-à-pair(P2P)etles
systèmes à mémoire virtuellement partagée (MVP) ont été conçus dans des contextes dif-
férents,ilspeuventêtreuneimportantesourced’inspirationpourlaconceptiond’unsystème
hybride, avec des hypothèses et des propriétés intermédiaires. Nous proposons le concept
deservicedepartagededonnéespourlecalculsurgrille,uncompromisentrelessystèmes
à MVP et les systèmesP2P. La principalecontribution d’untel serviceest de découplerla
gestiondes données ducalcul sur grille, en fournissantune localisationtransparente ainsi
qu’unstockage persistant desdonnéesdansunenvironnement dynamique. Pourillustrerle
concept proposé et valider sa faisabilité, nous avons implémenté un prototype : la plate-
formeJUXMEM.
Mots-clé : Mémoire virtuellement partagée, pair-à-pair, partage de données, grille,
JUXMEM
Peer-to-PeerDistributedSharedMemory? 3
1 A peer-to-peer DSM?
Peer-to-peer [18, 16] (P2P) computing has recently known a growing interest within the
distributedcomputingcommunity. Thisismainlyduetothesuccessoffilesharingsystems
likeNapster[36],Gnutella[19]orKaZaA[35],whichhaveproventheadequacyofthepeer-
to-peerapproachfordatasharingonhighlydynamic,large-scaleconfigurations(millionsof
nodes). SeveralresearchprojectsonP2Psystemsarecurrentlyinprogress,howevermostof
themhavefocusedondevisingsmartmechanismsforefficientsharingofimmutable, read-
only data at a very large scale. Only a few systems (e.g. Oceanstore [14], Ivy [17]) have
started to address the issue of sharing mutable data, but the current solutions have proven
efficientonlyinspecialcases: theygenerallyassumefewwritersorfewdatamodifications.
Ontheotherhand,intoday’sscientificapplications,datacangenerallyberead,butalso
modified by multiple sites. To handle the consistency of replicated data, a lot of models
and protocols have been proposed within the context of DSM systems (Distributed Shared
Memory[21]). However, letus note that thesesystemshave beendesignedby assuminga
static,small-scalearchitecture(typically, aclusterofPC).
Inthispaper,weshowthat,althoughP2PsystemsandDSMsystemshavebeendesigned
inratherdifferentcontexts, bothcanserveasmajorsourcesofinspirationforthedesignof
ahybridsystem,withintermediatehypothesesandproperties.
2 Why combine DSM systems and P2P systems?
2.1 Peer-to-peer systems: highscalabilityonhighlydynamic configurations
Thepeer-to-peermodelhasbeenmadepopularbysystemsallowingmusicfilestobeshared
amongmillionsofintermittentlyconnectedusers(Napster, Gnutella,etc.). Theunderlying
model of these systems is simple and complementary to the client-server model: the rela-
tionsbetweenmachinesaresymmetrical,eachnodecanbeclientinatransactionandserver
inanother. Ithasthusbeenproventhatsuchamodelscalesverywellwithoutanyneedfor
acentralizedstorageserverforthesharedfiles: eachclientnodeisequallyaserverandcan
providefilestotheothernodes. Asanexample,withintheKaZaAnetwork,4,500,000users
simultaneouslyconnectedshare900,000,000filescontaining9peta-bytesofdata. Thishigh
scalability has drawn the attention of the distributed systems community, since it shows a
way to make an important step forward in this field. Traditionaldistributed systems based
ontheclient-servermodelhaveoftenshownlimitedscalability,generallyduetobottlenecks
generatedbytheuseofcentralizedservers. Byremovingthesebottlenecks,thepeer-to-peer
model not onlyenhances the system’s scalability, but also improves its fault tolerance and
RR n˚4924
4 G.Antoniu,L.Bougé&M.Jan
availabilitydespitethehighnodevolatility. Thesystem’sactivityisnolongerdependenton
theavailabilityofasingleserver.
Theseimportantpropertiesexplainwhythepeer-to-peermodelhasattractedtheinterest
of the scientific distributed systems community. Within this context, the research efforts
mainlyfocusedondevisingefficientpeer-to-peerlocalizationandroutingschemes[20,25,
27, 29, 23, 12, 7], based on the use of distributed hash tables (DHT). These schemes have
been illustrated by systems like Chord [27], Tapestry[29] and Pastry [25], which serve as
basic layers for higher-level data management systems, such as CFS [6], Oceanstore and
PAST[26],respectively.
Ontheotherhand,wecannotethatthesesystemsfocusonsharingimmutablefiles: the
shared data are read-only and can be replicated at ease, without any limit on the number
ofcopies. Itiseasytounderstandthat,ifthedataweremodifiable,amechanismwouldbe
necessary to handle the consistency of the data copies. But, by guaranteeing consistency,
thesystemwouldsetanon-desiredlimittoscalability: themorethedatacopies,thehigher
theconsistency overhead. Therefore,mostpeer-to-peer systemsmakea compromise: they
favorscalabilitybysacrifyingthedatamutability.
Recently, some mechanisms for sharing mutable data in a peer-to-peer environment
have been proposed by systems like OceanStore, Ivy and P-Grid [8]. In OceanStore, for
each data, only a small set of primary replicas, called the inner ring agrees, serializes and
applies updates. Updates are then multicast down a dissemination tree to all other cached
copiesofthedata,calledsecondaryreplicas. However,OceanStoreusesaversioningmech-
anismwhichhasnotproven tobe efficientatlarge scales,aspublishedmeasurements[24]
ontheperformanceofupdatesassumeasinglewriterperdatablock. TheIvysystemhasone
main limitation: applications have to repair conflicting writes, thus the number of writers
per data is equally very limited. P-Grid proposes a flooding-based algorithm for updating
data, but assumes no conflicting writes. Besides, no experimental results have been pub-
lishedsofarforthissystem. Wecanclearlyconcludethathandlingconsistency isaserious
problemforpeer-to-peersystems: thepreliminarysolutionstentativelyproposedasoftoday
havethesignificantdrawbackoflimitingthesystemscalability, whichisthemainproperty
whichmakespeer-to-peersystemsinteresting.
2.2 DistributedShared Memory: consistencyand transparency
Theproblemofsharingmutabledataindistributedenvironmentshasbeenintensivelystud-
ied during the past fifteen years within the context of Distributed Shared Memory (DSM)
systems [15, 21, 2]. These systems provide transparent data sharing, via a unique address
spaceaccessibletophysicallydistributedmachines. Asinthecaseofpeer-to-peersystems,
readingdataonmultiplenodesmayresultindatareplication. ButtheDSMnodescanalso
INRIA
Peer-to-PeerDistributedSharedMemory? 5
modify the data, and this results in triggering some consistency action (e.g. invalidation
or update), according to some consistency protocol which implements a given semantics
statedbysomeconsistency model. Alarge varietyof DSMconsistency modelsandproto-
cols[21,10,5,4,11,30]havebeendefined[22],inordertoprovidedifferentcompromises
between the strength of the consistency guarantees and the efficiency of the consistency
actions. These efforts have been carried out within the context of research on high per-
formanceparallelcomputing,oftenwiththegoalof providing maximumtransparency ata
minimumcost.
A centralfeatureof DSM systems is transparency. First, these systems provide trans-
parentaccesstodata: allnodescanreadandwriteanyshareddatainauniformway,should
the data be local or remote. The DSM systeminternally checks fordata localityand takes
the appropriate action in order to satisfy the access. Second, DSM systems also provide
transparent localization: if the program accesses remote data, it is the responsibility of
theDSMsystemto localize,transferor replicateit locally, accordingtothe corresponding
consistency protocol.
However, existing DSM systems have generally shown satisfactory efficiency (i.e.
near-linear speedups) only on small-scale configurations (in practice, up to a few tens of
nodes [22]). This is often due to the intrinsic lack of scalability of the algorithms used
to handle data consistency. These algorithms have often been designed by assuming a
small number of copies per shared data. For instance, Multiple-Reader-Single-Writer al-
gorithms[15]clearlycannotperformwellatlargescale,sinceanywriteoperationonsome
dataresultsinanexpensiveinvalidationofallexistingdatacopies. Inthesameway,Home-
Based, Multiple-Writer algorithms [30] also rely on having the home node centralize and
merge data modifications from all writers. On the other hand, an overwhelming major-
ity of protocols assume a static configuration where nodes do not disconnect nor fail: the
uniquewriterofagivendataisnotsupposedtogodown, noristhehomenodeinahome-
based DSM. Only a few DSM systems have integrated some mechanisms for fault toler-
ance[28,13]. However,nodesfailuresaresupposedtobeinfrequentandareconsideredas
anexceptionalbehavior. Thisistobecontrastedwiththebasichypothesesof peer-to-peer
systems,inwhichnodesareassumedtojoinandleavethenetworkatanytime,asaregular
behavior. Therefore, we can conclude that getting DSM highly scalable and adaptive to
dynamicconfigurationsisarealchallenge,sinceitconflictswiththefoundingpropertiesof
traditionalDSMsystems.
2.3 Hybridapproach: a datasharingserviceforscientificcomputing
AlthoughP2PsystemsandDSMsystemshavebeendesignedinratherdifferentcontexts,we
thinkbothcanserve asmajorsourcesofinspirationforthedesignofahybrid datasharing
RR n˚4924
6 G.Antoniu,L.Bougé&M.Jan
DSM Griddataservice P2P
Scale (cid:0)(cid:2)(cid:1)(cid:4)(cid:3) –(cid:0)(cid:2)(cid:1)(cid:6)(cid:5) (cid:0)(cid:2)(cid:1)(cid:6)(cid:7) –(cid:0)(cid:2)(cid:1)(cid:6)(cid:8) (cid:0)(cid:2)(cid:1)(cid:6)(cid:9) –(cid:0)(cid:2)(cid:1)(cid:6)(cid:10)
Topology Flat Hierarchical Flat
Resourcecontrol High Medium Null
andtrustdegree
Dynamicity Null Medium High
Resource Homogeneous Ratherheterogeneous Heterogeneous
homogeneity (clusters) (clustersofclusters) (Internet)
Datatype Mutable Mutable Immutable
Application Complex Complex Simple
complexity
Typical Scientific Scientificcomputation Filesharingand
applications computation anddatastorage storage
Table1: AgriddatasharingserviceasacompromisebetweenDSMandP2Psystems.
system. If DSM systems can usually handle configurations of tens or hundreds of nodes,
corresponding to cluster computing, peer-to-peer systems generally target configurations
of millions of nodes, corresponding to the scale of Internet. The hybrid data sharing sys-
temweproposetargets configurationsof thousandsandtensofthousandsofnodes,which
correspondspreciselytothescaleofgridcomputing[9].
Therefore,wethinktheadequateapproachforthedesignofsuchasystemisnottobuild
a peer-to-peer DSM, nor a shared-memory peer-to-peer system, but rather a data sharing
serviceforgridcomputing. Suchaservicehastoaddresstheproblemofmanagingmutable
dataondynamic,large-scaleconfigurations. Theapproachweproposebenefitsbothfrom
DSM systems (transparent access to data, consistency protocols) and from P2P systems
(scalability, supportforresourcevolatilityanddynamicity).
Thesetwoclassesofsystemshavebeendesignedandstudiedinverydifferentcontexts.
In DSM systems, the nodes are generallyunder the controlof a single administration, and
theresourcesaretrusted. Incontrast,P2Psystemsaggregateresourceslocatedattheedgeof
theInternet,withnotrustguarantee,andloosecontrol. Moreoverthesenumerousresources
areessentiallyheterogeneousin termsof processors,operatingsystemsand networklinks,
as opposed to DSM systems, where nodesare generallyhomogeneous. Finally, DSMsys-
tems are typically used to support complex numerical simulation applications, where data
are accessed in parallel by multiple nodes. In contrast, P2P systems generally serve as a
INRIA
Peer-to-PeerDistributedSharedMemory? 7
supportforstoringandsharingimmutablefiles. Theseantagonistfeaturesaresummarized
inthefirstandthirdcolumnsofTable1.
Adatasharingservicetargetsphysicalarchitectureswithintermediatefeaturesbetween
those of DSM and P2P systems. It addresses scales of the order of thousands or tens of
thousandsofnodes,organizedasafederationofclusters,saytensorhundredsofhundred-
nodeclusters. Atagloballevel,theresourcesarethusratherheterogeneous,whiletheycan
probablybeconsideredashomogeneouswithintheindividualclusters. Thecontroldegree
and the trust degree are also intermediate, since the clusters may belong to different ad-
ministrations,whichsetupagreementsonthesharingprotocol. Finally, theservicetargets
numerical applications like heavy simulations, made by coupling individual codes. These
simulations process large amounts of data, with significant requirements in terms of data
storage and sharing. These intermediate features are illustrated in the second column of
Table1.
The main contribution of such a service is to decouple data management from grid
computation, by providing location transparency as well as data persistence in a dynamic
environment. Asexplainedinthescenariosdescribedbelow,suchaservicecanprovehelp-
fulforheavynumericalsimulations,basedoncodecoupling,withsignificantrequirements
intermsofdatastorageandsharing.
3 A data sharing service for the grid: sample scenarios
Persistence. Sincegridapplicationscanhandlelargemassesofdata,datatransferamong
sites can be costly, in terms of both latency and bandwidth. In order to limit these data
exchanges, the data sharing service has to rely on strategies able to 1) reuse previously
produced data; 2) trigger “smart” pre-fetching actions to anticipate future accesses and 3)
provide useful information on data location to the task scheduler, in order to optimize the
globalexecutioncost.
Letusconsiderthefollowingscenario,whichillustratesthefirstpointmentionedabove.
A client submits a computation to the grid infrastructure. The execution is
(cid:0)(cid:2)(cid:1)(cid:4)(cid:3) (cid:3)(cid:6)(cid:5)(cid:8)(cid:7)(cid:10)(cid:9)(cid:12)(cid:11)(cid:14)(cid:13)
scheduled on server (cid:0) . To run this computation, the client needs to submit and
(cid:15) (cid:7) (cid:11)
(whichmaybelarge matrices)to (cid:0) . Attheendofthecomputation, istransferredfrom
(cid:15) (cid:0)
(cid:0) to the client. Let us now assume that the client has to execute a second computation
(cid:15)
(cid:16) onthesameserver. Todothis,theclientwouldhavetoresubmit and to
(cid:1)(cid:17)(cid:3) (cid:5) (cid:5)(cid:8)(cid:11)(cid:18)(cid:9) (cid:0) (cid:13) (cid:11) (cid:0)
(cid:0) . Toavoidsuchunnecessarytransfers,thedatasharingservicehastoprovidepersistence
(cid:15)
by allowing to reuse the matrices already present on the storage infrastructure. The client
shouldbeabletocharacterizethepersistenceguaranteesforanydatastoredbytheservice.
RR n˚4924
Description:Key-words: distributed shared memory, peer-to-peer, data sharing, grid, JUXMEM. (Résumé To handle the consistency of replicated data, a lot of models .. cal Report HPL-2002-57, HP Labs, March 2002. http://www.hpl.hp.com/.