Abdelkader Hameurlain Wenny Rahayu David Taniar (Eds.) Data Management 9 5 in Cloud, Grid 0 8 S C and P2P Systems N L 6th International Conference, Globe 2013 Prague, Czech Republic, August 2013 Proceedings 123 Lecture Notes in Computer Science 8059 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA AlfredKobsa UniversityofCalifornia,Irvine,CA,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen TUDortmundUniversity,Germany MadhuSudan MicrosoftResearch,Cambridge,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany Abdelkader Hameurlain Wenny Rahayu David Taniar (Eds.) Data Management in Cloud, Grid and P2P Systems 6th International Conference, Globe 2013 Prague, Czech Republic, August 28-29, 2013 Proceedings 1 3 VolumeEditors AbdelkaderHameurlain PaulSabatierUniversity IRITInstitutdeRechercheenInformatiquedeToulouse 118,routedeNarbonne,31062ToulouseCedex,France E-mail:[email protected] WennyRahayu LaTrobeUniversity DepartmentofComputerScienceandComputerEngineering Melbourne,VIC3086,Australia E-mail:[email protected] DavidTaniar MonashUniversity ClaytonSchoolofInformationTechnology Clayton,VIC3800,Australia E-mail:[email protected] ISSN0302-9743 e-ISSN1611-3349 ISBN978-3-642-40052-0 e-ISBN978-3-642-40053-7 DOI10.1007/978-3-642-40053-7 SpringerHeidelbergDordrechtLondonNewYork LibraryofCongressControlNumber:2013944289 CRSubjectClassification(1998):H.2,C.2,I.2,H.3 LNCSSublibrary:SL3–InformationSystemsandApplication,incl.Internet/Web andHCI ©Springer-VerlagBerlinHeidelberg2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection withreviewsorscholarlyanalysisormaterialsuppliedspecificallyforthepurposeofbeingenteredand executedonacomputersystem,forexclusiveusebythepurchaserofthework.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheCopyrightLawofthePublisher’slocation, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Permissionsforuse maybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violationsareliabletoprosecution undertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication, neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsor omissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothe materialcontainedherein. Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Globe is now an established conference on data management in cloud, grid and peer-to-peer systems. These systems are characterized by high heterogeneity, highautonomyanddynamicsofnodes,decentralizationofcontrolandlarge-scale distributionofresources.Thesecharacteristicsbringnewdimensionsanddifficult challenges to tackling data management problems. The still open challenges to data management in cloud, grid and peer-to-peer systems are multiple, such as scalability, elasticity, consistency, data storage, security and autonomic data management. The 6th International Conference on Data Management in Grid and P2P Systems (Globe 2013) was held during August 28–29, 2013, in Prague, Czech Republic. The Globe Conference provides opportunities for academics and in- dustry researchers to present and discuss the latest data management research and applications in cloud, grid and peer-to-peer systems. Globe 2013 received 19 papers from 11 countries. The reviewing process led to the acceptance of 10 papers for presentation at the conference and inclusion inthis LNCSvolume.Eachpaperwasreviewedbyatleastthree ProgramCom- mittee members. The selected papers focus mainly on data management (e.g., data partitioning, storage systems, RDF data publishing, querying linked data, consistency), MapReduce applications, and virtualization. The conference would not have been possible without the support of the ProgramCommittee members, external reviewers, members of the DEXA Con- ference Organizing Committee, and the authors. In particular, we would like to thank GabrielaWagnerandRolandWagner(FAW, UniversityofLinz)for their help in the realization of this conference. June 2013 Abdelkader Hameurlain Wenny Rahayu David Taniar Organization Conference Program Chairpersons Abdelkader Hameurlain IRIT, Paul Sabatier University, Toulouse, France David Taniar Clayton School of Information Technology, Monash University, Clayton, Victoria, Australia Publicity Chair Wenny Rahayu La Trobe University, Victoria, Australia Program Committee Philippe Balbiani IRIT, Paul Sabatier University, Toulouse, France Nadia Bennani LIRIS, INSA of Lyon, France Djamal Benslimane LIRIS, Universty of Lyon, France Lionel Brunie LIRIS, INSA of Lyon, France Elizabeth Chang Digital Ecosystems & Business intelligence Institute,CurtinUniversity,Perth,Australia Qiming Chen HP Labs, Palo Alto, California, USA, Alfredo Cuzzocrea ICAR-CNR, University of Calabria, Italy Fr´ed´eric Cuppens Telecom, Bretagne, France Bruno Defude Telecom INT, Evry, France Kayhan Erciyes Ege University, Izmir, Turkey Shahram Ghandeharizadeh University of Southern California, USA Tasos Gounaris Aristotle University of Thessaloniki, Greece Farookh Hussain University of Technology Sydney (UTS), Sydney, Australia Sergio Ilarri University of Zaragoza, Spain Ismail Khalil Johannes Kepler University, Linz, Austria Gildas Menier LORIA, University of South Bretagne, France Anirban Mondal University of Delhi, India Riad Mokadem IRIT, Paul Sabatier University, Toulouse, France VIII Organization Franck Morvan IRIT, Paul Sabatier University, Toulouse, France Fa¨ıza Najjar National Computer Science School, Tunis, Tunisia Kjetil Nørv˚ag Norwegian University of Science and Technology, Trondheim, Norway Jean-Marc Pierson IRIT, Paul Sabatier University, Toulouse, France Claudia Roncancio LIG, Grenoble University, France Florence Sedes IRIT, Paul Sabatier University, Toulouse, France Fabricio A.B. Silva Army Technological Center, Rio de Janeiro, Brazi M´ario J.G. Silva University of Lisbon, Portugal Hela Skaf LINA, Nantes University, France A. Min Tjoa IFS, Vienna University of Technology, Austria Farouk Toumani LIMOS, Blaise Pascal University, France Roland Wagner FAW, University of Linz, Austria Wolfram W¨oß FAW, University of Linz, Austria External Reviewers Christos Doulkeridis University of Piraeus, Greece Franck Ravat IRIT, Paul Sabatier University, Toulouse, France Raquel Trillo University of Zaragoza, Spain Shaoyi Yin IRIT, Paul Sabatier University, Toulouse, France Table of Contents Data Partitioning and Consistency Data Partitioning for Minimizing Transferred Data in MapReduce ..... 1 Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agrawal, Esther Pacitti, and Patrick Valduriez Incremental Algorithms for Selecting Horizontal Schemas of Data Warehouses: The Dynamic Case ................................... 13 Ladjel Bellatreche, Rima Bouchakri, Alfredo Cuzzocrea, and Sofian Maabout Scalable and Fully Consistent Transactions in the Cloud through Hierarchical Validation ........................................... 26 Jon Grov and Peter Csaba O¨lveczky RDF Data Publishing, Querying Linked Data, and Applications A Distributed Publish/Subscribe System for RDF Data............... 39 Laurent Pellegrino, Fabrice Huet, Fran¸coise Baude, and Amjad Alshabani An Algorithm for Querying Linked Data Using Map-Reduce........... 51 Manolis Gergatsoulis, Christos Nomikos, Eleftherios Kalogeros, and Matthew Damigos Effects of Network Structure Improvement on Distributed RDF Querying ....................................................... 63 Liaquat Ali, Thomas Janson, Georg Lausen, and Christian Schindelhauer Deploying a Multi-interface RESTful Application in the Cloud......... 75 Erik Albert and Sudarshan S. Chawathe Distributed Storage Systems and Virtualization Using Multiple Data Stores in the Cloud: Challenges and Solutions..... 87 Rami Sellami and Bruno Defude X Table of Contents Repair Time in Distributed Storage Systems......................... 99 Fr´ed´eric Giroire, Sandeep Kumar Gupta, Remigiusz Modrzejewski, Julian Monteiro, and St´ephane Perennes Development and Evaluation of a Virtual PC Type Thin Client System ......................................................... 111 Katsuyuki Umezawa, Tomoya Miyake, and Hiromi Goto Author Index.................................................. 125 Data Partitioning for Minimizing Transferred Data in MapReduce Miguel Liroz-Gistau1, Reza Akbarinia1, Divyakant Agrawal2, Esther Pacitti3, and Patrick Valduriez1 1 INRIA& LIRMM, Montpellier, France {Miguel.Liroz Gistau,Reza.Akbarinia,Patrick.Valduriez}@inria.fr 2 University of California, SantaBarbara [email protected] 3 UniversityMontpellier 2, INRIA& LIRMM, Montpellier, France [email protected] Abstract. ReducingdatatransferinMapReduce’sshufflephaseisvery important because it increases data locality of reduce tasks, and thus decreases the overhead of job executions. In the literature, several op- timizations have been proposed to reduce data transfer between map- persandreducers.Nevertheless,alltheseapproachesarelimitedbyhow intermediate key-value pairs are distributed over map outputs. In this paper, we address the problem of high data transfers in MapReduce, and propose a technique that repartitions tuples of the input datasets, and thereby optimizes the distribution of key-values over mappers, and increases the data locality in reduce tasks. Our approach captures the relationships between inputtuplesand intermediatekeysbymonitoring the execution of a set of MapReduce jobs which are representative of theworkload.Then,basedonthoserelationships,itassignsinputtuples to the appropriate chunks. We evaluated our approach through experi- mentation in a Hadoop deployment on top of Grid5000 using standard benchmarks.Theresultsshowhighreductionindatatransferduringthe shufflephase compared to NativeHadoop. 1 Introduction MapReduce [4] has established itself as one of the most popular alternatives for big data processing due to its programmingmodel simplicity and automatic management of parallel execution in clusters of machines. Initially proposed by Google to be used for indexing the web, it has been applied to a wide range of problems having to process big quantities of data, favored by the popularity of Hadoop [2], an open-source implementation. MapReduce divides the compu- tation in two main phases, namely map and reduce, which in turn are carried out by several tasks that process the data in parallel. Between them, there is a phase, called shuffle, where the data produced by the map phase is ordered, partitioned and transferred to the appropriate machines executing the reduce phase. A.Hameurlain,W.Rahayu,andD.Taniar(Eds.):Globe2013,LNCS8059,pp.1–12,2013. (cid:2)c Springer-VerlagBerlinHeidelberg2013