ebook img

Transactions on Large-Scale Data- and Knowledge-Centered Systems XXII PDF

192 Pages·2015·12.975 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Transactions on Large-Scale Data- and Knowledge-Centered Systems XXII

e n i l b u S l a Transactions on n r u o J Large-Scale 0 3 Data- and Knowledge- 4 9 S C Centered Systems XXII N L Abdelkader Hameurlain • Josef Küng • Roland Wagner Editors-in-Chief 123 Lecture Notes in Computer Science 9430 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zürich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at http://www.springer.com/series/8637 ü Abdelkader Hameurlain Josef K ng (cid:129) Roland Wagner (Eds.) Transactions on Large-Scale Data- and Knowledge- Centered Systems XXII 123 Editors-in-Chief Abdelkader Hameurlain RolandWagner IRIT,PaulSabatier University FAW Toulouse University of Linz France Linz Austria Josef Küng FAW University of Linz Linz Austria ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notesin Computer Science ISBN 978-3-662-48566-8 ISBN978-3-662-48567-5 (eBook) DOI 10.1007/978-3-662-48567-5 LibraryofCongressControlNumber:2015950449 SpringerHeidelbergNewYorkDordrechtLondon ©Springer-VerlagBerlinHeidelberg2015 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissionsthatmayhavebeenmade. Printedonacid-freepaper Springer-VerlagGmbHBerlinHeidelbergispartofSpringerScience+BusinessMedia (www.springer.com) Preface This volume contains six fully revised selected regular papers. The content of this volume covers a wide range of different and very hot topics in the field of data- and knowledge-management systems. Topics covered include algorithms for large-scale privateanalysis,modellingofentitiesfromsocialanddigitalworldsandtheirrelations, querying virtual security views of XML data, recommendation approaches using diversity-based clustering scores, hypothesis discovery, and data aggregation tech- niques in sensor network environments. Wewouldliketoexpressourthankstotheeditorialboardandtheexternalreviewers for thoroughly refereeing the submitted papers and ensuring the high quality of this volume. Special thanks go to Gabriela Wagner for her high availability and her valuable work in the realization of this TLDKS volume. July 2015 Abdelkader Hameurlain Josef Küng Roland Wagner Organization Editorial Board Reza Akbarinia Inria, France Stéphane Bressan National University of Singapore, Singapore Dagmar Auer FAW, Austria Bernd Amann LIP6 - UPMC, France Francesco Buccafurri Università Mediterranea di Reggio Calabria, Italy Qiming Chen HP-Lab, USA Tommaso Di Noia Politecnico di Bari, Italy Dirk Draheim University of Innsbruck, Austria Johann Eder Alpen Adria University Klagenfurt, Austria Georg Gottlob Oxford University, UK Anastasios Gounaris Aristotle University of Thessaloniki, Greece Theo Härder Technical University of Kaiserslautern, Germany Andreas Herzig IRIT, Paul Sabatier University, France Hilda Kosorus FAW, Austria Dieter Kranzlmüller Ludwig-Maximilians-Universität München, Germany Philippe Lamarre INSA Lyon, France Lenka Lhotská Technical University of Prague, Czech Republic Vladimir Marik Technical University of Prague, Czech Republic Mukesh Mohania IBM India, India Franck Morvan Paul Sabatier University, IRIT, France Kjetil Nørvåg Norwegian University of Science and Technology, Norway Gultekin Ozsoyoglu Case Western Reserve University, USA Themis Palpanas Paris Descartes University, France Torben Bach Pedersen Aalborg University, Denmark Günther Pernul University of Regensburg, Germany Sherif Sakr University of New South Wales, Australia Klaus-Dieter Schewe University of Linz, Austria A Min Tjoa Vienna University of Technology, Austria Chao Wang Oak Ridge National Laboratory, USA External Reviewers Joshua Amavi Orléans University, LIFO, France Yannick Chavalier Paul Sabatier University, IRIT, France Mirian Halfeld-Ferrari Orléans University, LIFO, France Jorge Martinez-Gil Software Competence Center Hagenberg, Austria Shaoyi Yin Paul Sabatier University, IRIT, France Contents BPMiner: Algorithms for Large-Scale Private Analysis . . . . . . . . . . . . . . . . 1 Quach Vinh Thanh and Anwitaman Datta System Modeling and Trust Evaluation of Distributed Systems. . . . . . . . . . . 33 Nagham Alhadad, Patricia Serrano-Alvarado, Yann Busnel, and Philippe Lamarre Efficient Querying of XML Data Through Arbitrary Security Views . . . . . . . 75 Houari Mahfoud and Abdessamad Imine Increasing Coverage in Distributed Search and Recommendation with Profile Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Maximilien Servajean, Esther Pacitti, Miguel Liroz-Gistau, Sihem Amer-Yahia, and Amr El Abbadi Hypothesis Discovery Exploiting Closed Chains of Relations. . . . . . . . . . . . 145 Kazuhiro Seki An Analysis of Variance-Based Methods for Data Aggregation in Periodic Sensor Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Hassan Harb, Abdallah Makhoul, David Laiymani, Oussama Bazzi, and Ali Jaber Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 BPMiner: Algorithms for Large-Scale Private Analysis B Quach Vinh Thanh and Anwitaman Datta( ) School of Computer Engineering, Nanyang Technological University, Singapore, Singapore {vtquach,anwitaman}@ntu.edu.sg Abstract. Anabundanceofdatageneratedfromamultitudeofsources, and intelligence derived by analyzing the same, has become an impor- tant asset across many walks of life. Simultaneously, it raises serious concerns about privacy. Differential privacy has become a popular way toreasonabouttheamountofinformationaboutindividualentriesofa datasetthatisdivulgedupongivingoutaperturbedresultforaqueryon a given data-set. However, current differentially-private algorithms are computationally inefficient, and do not explicitly exploit the abundance of data, thus wearing out the privacy budget irrespective of the volume of data. In this paper, we propose BPMiner, a solution that is both pri- vateandaccurate,whilesimultaneouslyaddressingthecomputationand budget challenges of very big datasets. The main idea is a non-trivial combination between differential privacy, sample-and-aggregation, and a classical statistical methodology called sequential estimation. Rigor- ousproofregardingtheprivacyandasymptoticaccuracyofoursolution are provided. Furthermore, experimental results over multiple datasets demonstrate that BPMiner outperforms current private algorithms in terms of computational and budget efficiencies, while achieving compa- rableaccuracy.Overall,BPMinerisapracticalsolutionbasedonstrong theoretical foundations for privacy-preserving analysis on big datasets. · · Keywords: Privacy budget Differential privacy Sample-and- · aggregation Large-scale analysis 1 Introduction 1.1 Motivation Recent years have witnessed a tremendous explosion of data generated from different walks of life. The desire to analyze such enormous volumes of data is vast–miningofmassivedatasetscanhelpusgainvaluableinsightwhichleadsto significantadvances[34].Consequently,thelastfewyearshavewitnessedagrow- ing interest both in academia as well as industry to find scalable solutions for large-scale data analytics. The advantages of analytics notwithstanding, asso- ciated privacy implications is of growing concern. It is desirable to carry out (cid:2)c Springer-VerlagBerlinHeidelberg2015 A.Hameurlainetal.(Eds.):TLDKSXXII,LNCS9430,pp.1–32,2015. DOI:10.1007/978-3-662-48567-51 2 Q.V. Thanh and A. Datta high-quality analyses, while protecting the confidentiality of information about individuals1 inadataset.Privacyprotectionwouldhelpalleviateadataholder’s anxiety and also encourage (when individuals may have the choice) him/her to contribute corresponding data for analytics. Given these issues, a natural question is: How do we derive intelligence from massive datasets while preserving every individual’s privacy? Differential pri- vacy [14,16] was proposed to address the ‘privacy’ concern. It prevents compro- mise of an individual’s privacy by ensuring that the presence of his/her data does not significantly change the distribution on the released results, and conse- quently, the result does not divulge anything more about the individual entries in the dataset. Differential privacy has a competitive advantage over other pri- vacy models: it offers a strong privacy guarantee irrespective of the adversary’s background knowledge. Therefore, we employ differential privacy as our desired notion of privacy. Under the umbrella of differential privacy, we also assume the well-known query-response model, in which a trusted curator (interactively) respondstomultiplequeriesfromtheanalyst.Thismodelischosenbecause(1)it allows derivation of theoretical insight [39] while providing rigor-based privacy assurance, and (2) algorithms under this platform often offer better accuracy (since they focus only on the queries of interest [18]). As a consequence, it has attractedadelugeofworksinrecentyears,e.g.,[18,28,33,37,38],anditcanalso be seen in common frameworks like PINQ [28], SuLQ [8] or GUPT [29]. Computational Efficiency. The‘massivedatasets’aspect,ontheotherhand, is less trivial to address. Literature on differential privacy assumes operations on a statistical database of reasonable size, but the algorithms are not opti- mized specifically to deal with extremely large datasets. Even without privacy requirements, scaling non-private mining algorithms to big datasets is itself a delicate topic [34]. An advocate for sampling, however, may argue that this challenge can be addressed. On handling big data, a simple methodology is divide-and-conquer, using sampling to permit computation on relatively small subsets [21,22]. Sample-and-aggregate (SaG) [18,29,33,37,38] is a well-known class of differentially private techniques that analyze various samples (blocks) of the original data. As a consequence, SaG algorithms (although not originally intended) seem fitted to work on massive datasets. However,adeeperinvestigationinvalidatesthisinference.CurrentSaGalgo- rithms enjoy strong, rigorous analysis in which the number of blocks increases asymptoticallywithdatasize[18,37,38].Althoughcomputationisconductedon a small-sized data block, such a large number of blocks may render the total execution time unacceptable. We illustrate this unfavorable situation with a synthetic dataset of 1,000,000 rows and 100 columns (which takes up around 800MB,assuming8bytesinstoragefordoublefloating-pointnumbers).Wegen- erate data in the same way as [22]: for each i-th row, the first 99 columns are 1 In this paper, an ‘individual’ refers to an entry in a statistical database, which may correspond to information about a real-world entity, e.g., a patient’s record, a financial transaction, etc.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.