Enabling High Data Throughput in Desktop Grids Through Decentralized Data and Metadata Management: The BlobSeer Approach Bogdan Nicolae, Gabriel Antoniu, Luc Bougé To cite this version: Bogdan Nicolae, Gabriel Antoniu, Luc Bougé. Enabling High Data Throughput in Desktop Grids Through Decentralized Data and Metadata Management: The BlobSeer Approach. Euro-Par 2009, TU Delft, Aug 2009, Delft, Netherlands. pp.404-416, 10.1007/978-3-642-03869-3_40. inria- 00410956v1 HAL Id: inria-00410956 https://hal.inria.fr/inria-00410956v1 Submitted on 25 Aug 2009 (v1), last revised 22 Oct 2009 (v2) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Enabling High Data Throughput in Desktop Grids Through Decentralized Data and Metadata Management: The BlobSeer Approach BogdanNicolae1,GabrielAntoniu2,andLucBouge´3 1 UniversityofRennes1,IRISA,Rennes,France 2 INRIA,CentreRennes-BretagneAtlantique,IRISA,Rennes,France 3 ENSCachan/Brittany,IRISA,France Abstract. WhereastraditionalDesktopGridsrelyoncentralizedserversfordata management,somerecentprogresshasbeenmadetoenabledistributed,largein- putdata,usingtopeer-to-peer(P2P)protocolsandContentDistributionNetworks (CDN).Wemakeastepfurtherandproposeageneric,yetefficientdatastorage whichenablestheuseofDesktopGridsforapplicationswithhighoutputdatare- quirements,wheretheaccessgrainandtheaccesspatternsmayberandom.Our solutionbuildsonablobmanagement serviceenabling alargenumber of con- currentclientstoefficientlyread/writeandappendhugedatathatarefragmented anddistributedatalargescale.Scalabilityunderheavyconcurrencyisachieved thankstoanoriginalmetadataschemeusingadistributedsegmenttreebuilton topofaDistributedHashTable(DHT).Theproposedapproachhasbeenimple- mentedanditsbenefitshavesuccessfullybeendemonstratedwithinourBlobSeer prototypeontheGrid’5000testbed. 1 Introduction During the recent years, Desktop Grids have been extensively investigated as an ef- ficient way to build cheap, large-scale virtual supercomputers by gathering idle re- sourcesfromaverylargenumberofusers.Ratherthenrelyingonclustersofworksta- tions belongingto institutionsand interconnectedthroughdedicated,high-throughput wide-area interconnect(which is the typicalphysicalinfrastructurefor Grid Comput- ing), Desktop Grids rely on desktop computers from individualusers, interconnected through Internet. Volunteer computing is a form of distributed computing in which DesktopGridsareusedforprojectsofpublicinterest:theinfrastructureisbuiltthanks tovolunteersthataccepttodonatetheiridleresources(cycles,storage)inordertocon- tributetosomepubliccause.Thesehighly-distributedinfrastructurescantypicallyben- efittoembarrassingly-parallel,computationallyintensiveapplications,wherethework- load correspondingto eachcomputationcan easily be dividedinto a verylargeset of smallindependentjobsthatcanbe scatteredacrossthe availablecomputingnodes.In a typical(andwidelyused)setting,DesktopGridsrelyona master/workerscheme:a central server (playing the master role) is in charge of distributing the small jobs to manyvolunteerworkersthathaveannouncedtheiravailability.Oncethecomputations are complete, the results are typically gathered on the central (master) server, which validatesthem,thenbuildsaglobalsolution.BOINC[1],XtremWeb[2]orEntropia[3] areexamplesofsuchDesktopGridsystems. Theinitial,widely-spreadusageofDesktopGridsforparallelapplicationsconsist- ing in non-communicatingtasks with smallinput/outputparametersis a directconse- quence of the physical infrastructure (volatile nodes, low bandwidth), unsuitable for communication-intensiveparallelapplicationswithhighinputoroutputrequirements. However,theincreasingpopularityofvolunteercomputingprojectshasprogressively leadtoattemptstoenlargethesetofapplicationclassesthatmightbenefitofDesktop Gridinfrastructures.Ifweconsiderdistributedapplicationswheretasksneedverylarge inputdata, it is no longerfeasible to rely on classic centralizedserver-basedDesktop Gridarchitectures,wheretheinputdatawastypicallyembeddedinthejobdescription andsenttoworkers:suchastrategycouldleadtosignificantbottlenecksasthecentral server getsoverwhelmedby downloadrequests. To cope with such data-intensiveap- plications,alternativeapproacheshavebeenproposed,with the goalof offloadingthe transferoftheinputdatafromthecentralserverstotheothernodesparticipatingtothe system,withpotentiallyunder-usedbandwidth. Twoapproachesfollowthisidea.OneofthemadoptsaP2Pstrategy,wherethein- putdatagetsspreadacrossthedistributedDesktopGrid(onthesamephysicalresources thatserveasworkers)[4].Acentraldataserverisusedasaninitialdatasource,from whichdataisfirstdistributedatalargescale.Theworkerscanthendownloadtheirinput datafromeachotherwhenneeded,usingforinstanceaBitTorrent-likemechanism.An alternativeapproach[4]proposestouseContentDistributionNetworks(CDN)toim- provetheavailabledownloadbandwidthbyredirectingtherequestsforinputdatafrom the central data server to some appropriate surrogate data server, based on a global scheduling strategy able to take into account criteria such as locality or load balanc- ing. TheCDN approachis morecostly then the P2Papproach(as itrelies ona set of dataservers),howeveritispotentiallymorereliable(asthesurrogatedata serversare supposedtobestableenough). In this paper, we make a step further and consider using Desktop Grids for dis- tributedapplicationswithhighoutputdatarequirements.Weassumethateachsuchap- plicationconsistsofasetofdistributedtasksthatproduceandpotentiallymodifylarge amountsofdatain parallel,underheavyconcurrencyconditions.Suchcharacteristics arefeaturedby3Drenderingapplications,ormassivedataprocessingapplicationsthat producedatatransformations.Insuchacontext,annewapproachtodatamanagementis necessary,inordertocopewithbothinputandoutputdatainascalablefashion.Veryre- centeffortshavepartiallyaddressedtheaboveissuesfromtheperspectiveofthecheck- pointproblem:thestdchksystem[5]proposestousetheDesktopGridinfrastructureto storethe(potentiallylarge)checkpointsgeneratedbyDesktopGridapplications.This proposal is however specifically optimized for checkpointing, where large data units needtobewrittensequentially.Itreliesonacentralizedmetadatamanagementscheme which becomes a potentialbottleneck when data access concurrencyis high. Related workhasbeencarriedoutintheareaofparallelanddistributedfilesystems[6–8]and archiving systems [9]: in all these systems the metadata management is centralized. Weproposeageneric,yetefficientstoragesolutionallowingDesktopGridstobeused byapplicationsthatgeneratelargeamountsofdata(notonlyforcheckpointing!),with randomaccesspatterns,variableaccessgrains,underpotentiallyheavyconcurrency. OursolutionisbasedonBlobSeer,ablobmanagementservicewedevelopedinorder toaddresstheissuesmentionedabove. Themaincontributionofthispaperistodemonstratehowthedecentralizedmeta- dataschemeusedinBlobSeerfitstheneedsoftheconsideredapplicationclass,andto demonstrateourclaimsthroughextensiveexperimentalevaluations.Thepaperisstruc- tured as follows. Section 2 gives an overview of our approach. Section 3 introduces theproposedarchitecture,witha focusonmetadatamanagementinSection4.Exten- sive experimentalevaluation is performed in Section 5. On-going and future work is discussedinSection6. 2 Towards a decentralized architecture for data and metadata management indesktop grids Asamplescenario:3Drendering Highresolution3Drenderingisknowntobecostly. Luckily, as rendering is embarrassingly parallel, both industry [10] and the research community[11]haveanactiveinterestinbuildingrenderfarmsbasedontheDesktop Gridparadigm.Inthiscontext,oneofthelimitationsencounteredregardsdatamanage- ment:bothrenderinginputandoutputdatareachhugesizesandareaccessedinafine- grain manner (frame-by-frame)under high contention(task-per-frame).From a more generalperspective,we aim atprovidingefficientdatamanagementsupportonDesk- topGridsforworkloadscharacterizedby:largeinputdata,largeoutputdata,random access patterns, fine-grain data access for both reading and writing and high access concurrency. Requirements for data management Given the workload characterization given above,wecaninferthefollowingdesirableproperties: Storagecapacityandspaceefficiency. The data storage system should be able to storehugepiecesofindividualdatablobs,foralargeamountofdataoverall.Be- sides,asweassumetheapplicationstobewrite-intensive,theymaygeneratemany versionsofthedata.Therefore,aspace-efficientstorageishighlydesirable. Readandwritethroughputunderheavyconcurrency. The data storage system should provide efficient and scalable support for input/output data accesses, as a large numberof concurrentprocesses concurrentlypotentially read and write the data. Efficientfine-grain,randomaccess. Asthegrainandtheaccesspatternmayberan- dom,thestoragesystemshouldrelyonagenericmechanismenablingefficientran- domfine-grainaccesseswithinhugedatablobs.Wedefinitelyfavorthisapproach ratherthanrelyingonoptimizedsequentialwrites(asin[5])toalargenumberof smallfiles,formanageabilityreasons. Versioningsupport. Providingversioningsupportisalsoadesirablefeature,favoring concurrency,assomeversionofadatablobmaybereadwhileanewversioniscon- currentlycreated. This fits the needsthe needs of checkpointingapplications[5], but is also suitable for efficient pipelining througha sequence of transformations onhugeamountsofdata. (a) AclassicDesktopGridarchitecture (b) AdecentralizedDesktopGridarchitecture Fig.1.MetadataanddatamanagementinDesktopGrids:centralizedversusdecentralized Globalarchitectureoverview InaclassicDesktopGridarchitecture(Figure1(a)),the workersdirectlyinteractwiththecentralscheduler,inordertorequestajob.Eachjob is defined by its input/output data, each of which can be downloaded/uploadedfrom a (typically centralized) location provided by the scheduler. This scheme has several majordisadvantages.First,dataisstoredinacentralizedway:thismayleadtoasignif- icantpotentialbottleneckfordata intensiveapplications.Second,the wholemetadata ismanagedbythecentralscheduler,whichisputunderhighI/Opressurebyallclients thatconcurrentlyreadthemetadataandmaypotentiallybecomeanI/Obottleneck. Whereas several recent approaches have proposed to decentralize data manage- ment in Desktop Grids through P2P or CDN-based strategies [4], to the best of our knowledge,nonehasexploredtheideaofdecentralizingmetadata.Wespecificallytake thisapproachandproposeanarchitecturebasedontheBlobSeerprototype,whichim- plementsan efficientlock-free,distributed metadata managementscheme, in orderto provide support for performant concurrent write accesses to data. The Desktop Grid architectureismodifiedasillustratedonFigure1(b).Theschedulerkeepsonlyamin- imal input/outputmetadata information:the bulk of metadata is delegated to a set of distributedmetadataproviders.AnoverviewofourapproachisgiveninSection3,with afocusonmetadatamanagementinSection4.Notethatafulldescriptionofthealgo- rithmsusedisavailablein[12]:inthispaperwebrieflyremindthem,thenwefocuson theimpactofdistributingdataandmetadata. 3 The core ofourapproach: theBlobSeer service BlobSeer at a glance We have experimentally studied the approach outlined above usingourBlobSeerprototype:ablob(BinaryLargeOBject)managementservice[13, 14].Tocopewithverylargedatablobs,BlobSeerusesstriping:eachblobismadeup ofblocksofafixedsizepsize,referredtoaspages.Thesepagesaredistributedamong storage space providers. Metadata facilitates access to a range (offset, size) for any existingversionofablobsnapshot,byassociatingsucharangewiththephysicalnodes wherethecorrespondingpagesarelocated. BlobSeer’sclientapplicationsmayupdateblobsbywritingaspecificrangewithin theblob(WRITE).Ratherthanupdatingthecurrentpages,eachsuchoperationgener- ates a new set of pages correspondingto the offset and size requested to be updated. ClientsmayalsoAPPENDnewdatatoexistingblobs,whichalsoresultsinthecreation ofnewpages.Inbothcases,metadataisthengeneratedand“weaved”togetherwiththe old metadata in such way as to create the illusion of a new incrementalsnapshotthat actuallysharestheunmodifiedpagesoftheblobwiththeolderversions.Thus,twosuc- cessivesnapshotsv andv+1physicallysharethepagesthatfalloutsideoftherange of the update that generated snapshot v +1. Metadata is also partially shared across successive versions,asfurtherexplainedbelow.Clients mayaccess a specific version ofagivenblobthroughtheREADprimitive. Metadataisorganizedasa segment-treelikestructure(seeSection4)andisscat- tered across the system using a Distributed Hash Table (DHT). Distributing data and metadataisthekeychoiceinourdesign:itenableshighperformancethroughparallel, directaccessI/Opaths,asdemonstratedinSection5. BlobSeer’s architecture. Our storage infrastructure consists of a set of distributed processes. Clients mayCREATEblobs,thenREAD,WRITEandAPPENDdatatothem.Theremay bemultipleconcurrentclients,andtheirnumbermaydynamicallyvaryintime. Dataproviders physically store the pages generated by WRITE and APPEND. New dataprovidersmaydynamicallyjoinandleavethesystem.InaDesktopGridset- ting,thesamephysicalnodeswhichactasworkersmayalsoserveasdataproviders. Theprovidermanager keeps information about the available storage space and schedules the placementof newly generatedpages accordingto a load balancing strategy. Metadataproviders physically store the metadata allowing clients to find the pages corresponding to the blob snapshot version. We use a distributed metadata man- agement scheme to enhance data throughput through parallel I/O: this aspect is addressed in detail in Section 4. In a Desktop Grid setting, as the data providers, metadataproviderscanbephysicallymappedtothephysicalnodesactingaswork- ers. Theversionmanager is the key actor of the system. It registers update requests (APPENDandWRITE),assignssnapshotversionnumbers,endeventuallypublishes new blob versions,while guaranteeingtotal orderingand atomicity.In a Desktop Gridsetting,thisentitycanbecollocatedwiththecentralizedscheduler. Reading data To read data, clients need to provide a blob id, a specific version of that blob, and a range, specified by an offset and a size. The client first contacts the versionmanager.Iftheversionhasbeenpublished,theclientthenqueriesthemetadata providers for the metadata indicating on which providers are stored the pages corre- sponding to the required blob range. Finally, the client fetches the pages in parallel fromthedataproviders.Iftherangeisnotpage-aligned,theclientmayrequestonlythe requiredpartofthepagefromthepageprovider. Writing and appending data To write data, the client first determines the number ofpagesthatcoverthe rangeto be written.Itthencontactsthe providermanagerand requestsa listofpageprovidersable tostorethe pages.Foreachpagein parallel,the clientgeneratesagloballyuniquepageid,contactsthecorrespondingpageproviderand storesthecontentsofthepageonit.Aftersuccessfulcompletionofthisstage,theclient contactstheversionmanagertoregistersitsupdate.Theversionmanagerassignstothis updateanewsnapshotversionvandcommunicatesittotheclient,whichthengenerates newmetadataand“weaves”ittogetherwiththeoldmetadatasuchthatthenewsnapshot v appearsasa standaloneentity(detailsare providedin Section4). Finally,theclient notifies the version manager of success, and returns successfully to the user. At this point,theversionmanagerisresponsibleforeventuallypublishingtheversionvofthe blob.TheAPPENDoperationisalmostidenticaltotheWRITE:theimplicitoffsetisthe sizeofthepreviouslypublishedsnapshotversion. 4 Zoom on metadata management (a) The metadata af- (b) The metadata after (c) The metadata after an ap- ter a write of four overwritingtwopages pendofonepage pages Fig.2.Metadatarepresentation Metadatastoresinformationaboutthepageswhichmakeupagivenblob,foreach generatedsnapshotversion.Toefficientlybuildafullviewofthenewsnapshotofthe blobeachtimeanupdateoccurs,BlobSeercreatesnewmetadata,ratherthanupdating oldmetadata.Aswewillexplainbelow,thisdecisionsignificantlyhelpsusprovidesup- portforheavy concurrency,as it favors independentconcurrentaccessesto metadata withoutsynchronization. Thedistributedmetadatatree Weorganizemetadataasadistributedsegmenttree[15]: onesuchtreeisassociatedtoeachsnapshotversionofagivenblobid.Asegmenttree is a binary tree in which each node is associated to a range of the blob, delimited by offsetandsize.Wesaythatthenodecoverstherange(offset,size).Foreachnodethatis notaleaf,theleftchildcoversthefirsthalfoftherange,andtherightchildcoversthe secondhalf.Eachleafcoversasinglepage.Weassumethepagesizepsizeisapower of two. Figure 2(a) depicts the structure of the metadata for a blob consisting of four pages.Weassumethepagesizeis1.Therootofthetreecoverstherange(0,4),while eachleafcoversexactlyonepageinthisrange. Tofavorefficientconcurrentaccesstometadata,treenodesarestoredonthemeta- dataprovidersinadistributedway,usingasimpleDHT(DistributedHashTable).Each treenodeisidentifieduniquelybyitsversionandrangespecifiedbytheoffsetandsize itcovers. Sharingmetadataacrosssnapshotversions Sucha metadatatree iscreatedwhenthe first pagesof the blobare written, forthe rangecoveredbythose pages. To avoidthe overhead(intimeandspace!)ofrebuildingsuchatreeforthesubsequentupdates,we createnewtreenodesonlyfortherangesthatdointersectwiththerangeoftheupdate. Thesenewtreenodesare“weaved”withexistingtreenodesgeneratedbypastupdates (for ranges that do not intersect with the range of the update), in order to build a a newconsistentviewoftheblob,correspondingtoanewsnapshotversion.Figure2(b), shows how metadata evolves when pages 2 and 3 of the 4-page blob represented on Figure2(a)aremodifiedforthefirsttime. Expandingthemetadatatree APPENDoperationsmaketheblob“grow”:consequently, themetadatatreegetsexpanded,asillustratedonFigure2(c),wherenewmetadatatree nodes are generated, to take into account the creation of a fifth page by an APPEND operation. Readingmetadata MetadataisaccessedduringaREADoperationinordertofindout whatpagesfully coverthe requestedrangeR. Thisinvolvestraversingdownthe seg- menttree,startingfromtherootthatcorrespondstotherequestedsnapshotversion.To startthetraversal,theclientgetstherootfromtheversionmanager,whichisresponsi- ble to storethe mappingbetweenallsnapshotversionsand theircorrespondingroots. AnodeN thatcoverssegmentRN isexplorediftheintersectionofRN withRisnot empty.SincetreenodesarestoredinaDHT,exploringanodeinvolvesfetchingitfrom the DHT. All exploredleaves reached this way provideinformationallowing to fetch thecontentsofthepagesfromthedataproviders. Writingmetadata Foreachupdate(WRITEorAPPEND)producingasnapshotversion v, it is necessary to build a new metadata tree (possibly sharing nodes with the trees correspondingto previoussnapshot versions). This new tree is the smallest (possibly incomplete) binary tree such that its leaves are exactly the leaves covering the pages ofrangethatiswritten.Thetreeisbuiltbottom-up,fromtheleaverstowardstheroot. Notethatinnernodesmayhavechildrenwhichdonotintersecttherangeoftheupdate tobe processed.Foranygivensnapshotversionv, these nodesformthe setofborder nodesBv. When buildingthe metadata tree, the versionsof these nodesare required. Theversionmanageris responsibleto computethe versionsof the bordernodessuch astoassureproperupdateorderingandconsistency. 5 Experimental evaluation To illustratethe advantagesofourproposedapproachwehaveperformeda setofex- perimentsthatstudytheimpactofbothdataandmetadatadistributionontheachieved performancelevelsunderheavywriteconcurrency.Allexperimentationhasbeenper- formedusingourBlobSeerprototype. OurimplementationofthemetadataproviderreliesonacustomDHT(Distributed Hash Table) based on a simple static distribution scheme. Communication between nodes is implemented on top of an asynchronous RPC library we developed on top of the Boost C++ ASIO library [16]. The experiments have been performed on the 120 120 115 100 s) s) 110 B/ 80 B/ M M 105 dth ( 60 dth ( 100 ndwi 40 64M page, 1 writers ndwi 95 Ba 64M page, 20 writers Ba 90 8MB page 20 64M page, 40 writers 85 16MB page 64M page, 60 writers 32MB page 0 80 1 20 40 60 1 10 20 40 60 Number of data providers Number of concurrent writers (a) Impactofdatadistributionontheaverageout-(b) Impactofjoboutputsizeontheaverageout- putbandwidthofworkers putbandwidthofworkers Fig.3.Datadistributionbenefitsunderhighwriteconcurrency Grid’5000 [17] testbed, a reconfigurable,controllable and monitorable Grid platform gathering9sitesinFrance.Foreachexperiment,weusednodeslocatedwithinasingle site(RennesorOrsay).Thenodesareoutfittedwithx86 64CPUsand4GBofRAM. Intraclusterbandwidthis1Gbit/s(measured:117.5MB/sforTCPsocketswithMTU= 1500B),latencyis0.1ms. 5.1 Benefitsofdatadecentralization Toevaluatetheimpactofdatadecentralizationontheworkers’writeperformance,we considerasetofconcurrentworkersthatwritetheoutputdatainparallelandwemea- suretheaveragewritebandwidth. Impact of the number of data providers In this setting, we deploy a version man- ager, a provider manager and a variable number of data providers. We also deploy a fixednumberofworkersandsynchronizethemtostartwritingoutputdatasimultane- ously. Each process is deployed on a dedicated node within the same cluster. As this experimentaimsatevaluatingtheimpactofdatadecentralization,wefixthepagesize at64MB,largeenoughastogenerateonlyaminimalmetadatamanagementoverhead. Weassumeeachworkerwritesasinglepage.Asinglemetadataprovideristhussuffi- cientforthisexperiment. Eachworkergeneratesandwritesitsoutputdata50times.Wecomputetheaverage bandwidthachievedbyallworkers,foralltheiriterations.Theexperimentisrepeated byvaryingthetotalnumberofworkersfrom1to60. ResultsareshownonFigure3(a):foronesingleworker,usingmorethanonedata providerdoesnotmakeanydifferencesince a single dataprovideris contactedatthe same time. However,when multiple workersconcurrentlywrite their outputdata, the benefits of data distribution become visible. Increasing the number of data providers leadstoadramaticincreaseinbandwidthperformance:fromacoupleofMB/stoover 100MB/swhenusing60dataproviders.Bandwidthperformanceflattensrapidlywhen thenumberofdataprovidersisatleastthenumberofworkers.Thisisexplainedbythe factthatusingatleastasmanydataprovidersasworkersenablestheprovidermanager to direct each concurrentwrite request to a distinct data provider. Under such condi- tions, the bandwidthmeasured for a single worker under no concurrency(115 MB/s) isjustby12%higherthantheaveragebandwidthreachedwhen60workerswritethe outputdataconcurrently(102MB/s). Impact of the page size We then evaluate the impact of the data output size on the achievedwritebandwidth.Asintheprevioussetting,we deploya versionmanager,a providermanagerand a metadata provider.This time we fix the number of providers to 60.We deploya variablenumberof workersandsynchronizethemto startwriting outputdatasimultaneously.Eachworkeriterativelygeneratesitsjoboutputandwritesit asasinglepagetoBlobSeer(50iterations).Theachievedbandwidthisaveragedforall workers.Werepeattheexperimentfordifferentsizesofthejoboutput:32MB,16MB, 8MB.AscanbeobservedinFigure3(b),ahighbandwidthissustainedaslongasthe pageislargeenough.Theaverageclientbandwidthdropsfrom110MB/s(for32MB pages)to102MB/s(for8MBpages). 5.2 Benefitsofmetadatadecentralization Intheprevioussetofexperimentswehaveintentionallyminimizedtheimpactofmeta- datamanagement,inordertoemphasizetheimpactofdatadistribution.Asanextstep, we study how metadata decentralization impacts performance. We consider a setting withalargenumberofworkers,eachofwhichconcurrentlygeneratesmanypages,so thatmetadatamanagementbecomesaconcern. Impactofthenumberofmetadataproviders Toevaluatetheimpactofmetadatadis- tributionasaccuratelyaspossible,wefirstdeployasmanydataprovidersasworkers, toavoidpotentialbottlenecksontheproviders.Eachprocessisdeployedonaseparate physicalnode.We deploya versionmanager,a providermanageranda fixednumber of60providers.Wethenlaunch60workersandsynchronizethemtostartwritingout- putdatasimultaneously.Eachworkeriterativelygeneratesafixed-sized64MBoutput and writes it to BlobSeer (50 iterations). The achieved bandwidth is averaged for all workers.Wevarythenumberofmetadataprovidersfrom1to30.Werepeatthewhole experimentforvariouspagesizes(64KB,128KBand256KB). ResultsonFigure4(a)showthatincreasingthenumberofmetadataprovidersre- sults in an improved average bandwidth under heavy concurrency.The improvement is more significant when reducing the page size: since the amount of the associated metadatadoubleswhenthepagesizehalves,theI/Opressureonthemetadataproviders doublestoo.Wecanthusobservethattheuseofacentralizedmetadataproviderleads to a clear bottleneck (62 MB/s only), whereas using 30 metadata providersimproves thewritebandwidthbyover20%(75MB/s). Impactoftheco-deploymentofdataandmetadataproviders Inordertoincrease thescopeofourevaluationwithoutincreasingthenumberofphysicalnetworknodes, weperformanadditionalexperiment.We keepexactlythesamesettingaspreviously, but we co-deploya data providerand a metadata provideron each physicalnode(in- stead of deploying them on separate nodes). We expect to measure a consistent per- formancedropandestablishacorrelationwiththeresultsofthepreviousexperiment. Thiscanenableustopredicttheperformancebehaviorofoursystemforlargerphysical configurations. Results are shown in Figure 4(b), for two page sizes: 128 KB and 64 KB. For reference,theresultsobtainedinthepreviousexperimentforthesamejoboutputsare
Description: