Lecture Notes in Computer Science 7720 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen TUDortmundUniversity,Germany MadhuSudan MicrosoftResearch,Cambridge,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA MosheY.Vardi RiceUniversity,Houston,TX,USA GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany Abdelkader Hameurlain Josef Küng Roland Wagner (Eds.) Transactions on Large-Scale Data- and Knowledge- Centered Systems VII 1 3 Editors-in-Chief AbdelkaderHameurlain PaulSabatierUniversity,IRIT 118routedeNarbonne,31062ToulouseCedex,France E-mail:[email protected] JosefKüng RolandWagner UniversityofLinz,FAW Altenbergerstraße69,4040Linz,Austria E-mail:{jkueng,rrwagner}@faw.at ISSN0302-9743(LNCS) e-ISSN1611-3349(LNCS) ISSN1869-1994(TLDKS) ISBN978-3-642-35331-4 e-ISBN978-3-642-35332-1 DOI10.1007/978-3-642-35332-1 SpringerHeidelbergDordrechtLondonNewYork LibraryofCongressControlNumber:2012952682 CRSubjectClassification(1998):H.2.4,H.2.8,I.2,E.2,I.6.5,H.3.1,D.3.2-3, G.3,E.1 ©Springer-VerlagBerlinHeidelberg2012 Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, inistcurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnotimply, evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelaws andregulationsandthereforefreeforgeneraluse. Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface This volume is the second so-called regular volume of our journal. We find here the resultof the review processcoveringallthe submissions thathave beensent directlytothejournaladministrationsinceourlastregularvolumein2012.Rec- ognized scientists checked the quality and gave helpful feedback to the authors – many thanks to them from the editorial board. Thisreviewprocessresultedintheselectionoffivecontributions.Theirrange is from data management via data streams to service oriented computing, and from an abstract algebraic framework via RDF and ontologies to a conceptual model framework. In the first contribution, ‘RFID Data Management and Analysis via Tensor Calculus’, Roberto De Virgilio andFranco Milicchio meet the challengeof man- aging the increasing amount of RFID data in supply chains. They introduce a novel algebraic framework for modeling a supply chain and for efficiently per- forming analysis. The main advantage of this approach is its theoretical sound- ness. Tensor calculus provides the backgroundfor both modeling and querying. Then,AbhirupChakrabortyandAjitSinghaddresstheissueofprocessingex- actslidingwindowjoinsbetweendatastreamsinamemory-limitedenvironment having burstiness in stream arrivals. ‘Processing Exact Results for Windowed Stream Joins in a Memory-Limited System: A Disk-Based Approach’ providesa frameworkandproposesanalgorithmto solvethatproblem.Disk storageanda smart I/O strategy are used to meet the goals. In the third article, we switch to semantic issues. In ‘Reducing the Semantic Heterogeneity of Unstructured P2P Systems: A Contribution Based on a Dis- semination Protocol’ Thomas Cerqueus, Sylvie Cazalens and Philippe Lamarre address the situation that arises when, in a peer-to-peer system, different peers are using different ontologies to annotate their data. First they propose a set of measures to characterize different facets of heterogeneity. Then they introduce a gossip-basedprotocol that allows some of them to be reduced. ‘Towards a Scalable Semantic Provenance Management System’, written by Mohamed Amin Sakka and Bruno Defude, presents a new provenance manage- mentsystem.Itallowsprovenancesourcestobeimportedandenrichedsemanti- callyto obtainahigh-levelrepresentationofprovenance.Thepowerofsemantic web technologies is used for heterogeneous, multiple source and decentralized provenanceintegration.Sorichanswersaboutthewholedocumentlifecyclecan be provided and trustworthiness can be increased. Finally, Colin Atkinson, Philipp Bostan and Dirk Draheim concentrate on service-oriented distributed systems. In ‘A Unified Conceptual Framework for Service-Oriented Computing: Aligning Models of Architecture and Utilization’ they support various concepts and models and thereby make it possible to customize and simplify each client developer’s view as well as the way in which VI Preface service providers develop and maintain their services. The paper provides a unified conceptual framework for describing the components of service oriented computing systems, in particular its foundations and concepts, and a small example to show how the models are applied in practice. Lastbut notleastwe wouldlike tothank GabrielaWagnerforsupportingus with the organizationand we hope that you enjoy this TLDKS volume. October 2012 Abdelkader Hameurlein Josef Ku¨ng Roland Wagner Editorial Board Reza Akbarinia INRIA, France St´ephane Bressan National University of Singapore, Singapore Francesco Buccafurri Universit`a Mediterranea di Reggio Calabria,Italy Qiming Chen HP-Lab, USA Tommaso Di Noia Politecnico di Bari, Italy Dirk Draheim University of Innsbruck, Austria Johann Eder Alpen Adria University Klagenfurt, Austria Stefan Fenz Vienna University of Technology, Austria Georg Gottlob Oxford University, UK Anastasios Gounaris Aristotle University of Thessaloniki,Greece Theo H¨arder Technical University of Kaiserslautern,Germany Dieter Kranzlmu¨ller Ludwig-Maximilians-Universit¨atMu¨nchen, Germany Philippe Lamarre University of Nantes, France Lenka Lhotsk´a Technical University of Prague, Czech Republic Vladimir Marik Technical University of Prague, Czech Republic Dennis McLeod University of Southern California, USA Mukesh Mohania IBM India, India Tetsuya Murai Hokkaido University, Japan Gultekin Ozsoyoglu Case Western Reserve University, USA Oscar Pastor Polytechnic University of Valencia, Spain Torben Bach Pedersen Aalborg University, Denmark Gu¨nther Pernul University of Regensburg, Germany Klaus-Dieter Schewe University of Linz, Austria Makoto Takizawa Seikei University Tokyo, Japan David Taniar Monash University, Australia A Min Tjoa Vienna University of Technology, Austria Table of Contents RFID Data Management and Analysis via Tensor Calculus............ 1 Roberto De Virgilio and Franco Milicchio Processing Exact Results for Windowed Stream Joins in a Memory-Limited System: A Disk-Based, Adaptive Approach ...... 31 Abhirup Chakraborty and Ajit Singh Reducing the Semantic Heterogeneity of Unstructured P2P Systems: A Contribution Based on a Dissemination Protocol................... 62 Thomas Cerqueus, Sylvie Cazalens, and Philippe Lamarre Towards a Scalable Semantic Provenance Management System......... 96 Mohamed Amin Sakka and Bruno Defude A Unified Conceptual Framework for Service-Oriented Computing: Aligning Models of Architecture and Utilization...................... 128 Colin Atkinson, Philipp Bostan, and Dirk Draheim Author Index.................................................. 171 RFID Data Management and Analysis (cid:2) via Tensor Calculus Roberto De Virgilio and Franco Milicchio Dipartimento diInformatica e Automazione Universit´aRoma Tre, Rome, Italy {dvr,milicchio}@dia.uniroma3.it Abstract. Traditional Radio-Frequency IDentication (RFID) applica- tions have been focused on replacing bar codes in supply chain man- agement. The importance of such new resource soared in recent years, mainly due to the retailers’ need of governing supply chains. However, duetothemassiveamountofRFID-relatedinformation insupplychain management,attainingsatisfactoryperformancesinanalyzingsuchdata sets is a challenging issue. Popular approaches providehard-coded solu- tions, with high consumption of resources; moreover, these exhibit very inadequateadaptability when dealing with multidimensional queries, at various levels of granularity and complexity. Inthispaperweproposeanovelmodelforsupplychainmanagement, aimingatgenerality,correctness,andsimplicity.Suchmodelisbasedon thefirstprinciplesofmultilinearalgebra,specifically,oftensorialcalculus. Leveraging our abstract algebraic framework, we envision a system allowing bothquickdecentralized on-lineitem discovery andcentralized off-line massive business logic analysis, according to needs and require- ments of supply chain actors. Being our computations based on vecto- rial calculus, we are able to exploit theunderlyinghardware processors, achieving a huge performance boost, as the experimental results show. Moreover, by storing only the needed data, and benefiting from linear properties, we are able to carry out the required computations even in high memory constrained environments, such as on mobile devices, and inparallelanddistributedtechnologiesbysubdividingourtensorobjects into sub-blocks,and processing them independently. 1 Introduction A supply chain is a complex system for transferring products or services from a supplier to a final customer. Aiming at improving the exchange of information, retailers are investing in new technologies for managing supply chains: to this aim, RFID, the Radio-Frequency Identification, is a recent potential wireless technology, which enables a direct link of a product with a virtual one, within (cid:2) ThisworkhasbeenpartiallyfundedbytheItalianMinistryofResearch,grantnum- ber RBIP06BZW8, FIRB project “Advanced tracking system in intermodal freight transportation”. A.Hameurlainetal.(Eds.):TLDKSVII,LNCS7720,pp.1–30,2012. (cid:2)c Springer-VerlagBerlinHeidelberg2012 2 R. DeVirgilio and F. Milicchio information systems [1, 2]. Generally, RFID tags attached to products, store an identification code named EPC—which stands for Electronic Product Code [3], acoding scheme forRFID tags,aimedatuniquely identify them—used asa key to retrieve relevant properties of an object from a database, usually within a networked infrastructure. Usually,anRFIDapplicationgeneratesasetoftuples,usuallycalledrawdata, oftheformofatriple(e,l,t),whereeisanEPC,l representsthelocationwhere anRFIDreaderhasscannedtheobjecte,andtisthetimewhenthereadingtook place; while these are fundamental properties, others can be retrieved as well, e.g., temperature, pressure, and humidity. Tags may have multiple subsequent readings at the same location, potentially generating vast amounts of raw data. Consequently, a simple data cleaning technique consists in converting raw data instay records, i.e., tuples structuredas(e,l,t ,t ), wheret andt arethe time i o i o when an object enters or leaves a location l, respectively. In this manuscript, we address the challenging problem of efficiently man- aging the tera-scale amount of data per day, generated by RFID applications (cf. [4], [5], [6], and [7]), focusing on stay records as the basic block to store RFID data. We advocate the use of a model basedonthe firstprinciples of ten- sorial algebra, achieving a great deal of flexibility, while maintaining simplicity and correctness. Recently, two models have been proposed: a herd and a single model. The former[6]proposesawarehousingmodelobservingthat,usually,productsmove together in large groups at the beginning, and continue in small herds towards the end of the chain: this allows the aggregation and reduces significantly the size of the database. On the contrary, the latter [7] focuses on the movement of single, non grouped tags, defining query templates and path encoding schemes to process tracking and path oriented queries efficiently. Both of these approaches present limited flexibility dealing multidimensional queries at varying levels of granularity and complexity. To exploit the compres- sionmechanism,suchproposalshavetofixthedimensionsofanalysisinadvance and implement ad-hoc data structures to be maintained. It is not known in ad- vance whether objects are sorted and grouped, and therefore, it is problematic to support different kinds of high-level queries efficiently. In this paper we propose a novel approachbased on multilinear algebra. Ma- trixoperationsareinvaluabletoolsinseveralfields,fromengineeringanddesign, graphtheory,ornetworking;inparticularstreamsandco-evolvingsequencescan also be envisioned as matrices: each data source, a sensor, may correspond to a row, while time-ticks to a column (cf. [8]). Standard matrix approaches focus on a standard two-dimension space, while we extend the applicability of such techniques with the more general definition of tensors, a generalization of linear forms, which are usually represented by matrices.We maythereforetakeadvantageofthe vastliterature,boththeoretic and applied, regarding tensor calculus. Computational algebra is employed in critical applications, such as engineering and physics, and several libraries have been developed with two major goals: efficiency and accuracy. RFIDData Management and Analysis via Tensor Calculus 3 Contribution. Leveraging such background, this paper proposes a general model of supply chains, mirrored with a formal tensor representation and en- dowedwithspecificoperators,allowingbothquickdecentralizedon-lineprocess- ing, and centralized off-line massive business logic analysis, according to needs andrequirementsofsupplychainactors.Ourmodelandoperations,inheritedby linear algebra and tensor calculus, is therefore theoretically sound, and its im- plementation benefits from numerical libraries developed in the past, exploiting the underlying hardware, optimized for mathematical operations. Additionally, due to the properties of our tensorial model, we are able to attain two significa- tive features:the possibility ofconducting computationsinmemory-constrained environments such as on mobile devices, and exploiting modern parallel and distributedtechnologies,owingtothepossibilityformatrices,andthereforeten- sors,to be dissectedinto severalchunks, andprocessedindependently (i.e., also on-the-fly). Outline. Our manuscript is organized as follows. In section 2 we will briefly recall the available literature, while Section 3 will be devoted to the introduc- tionoftensorsandtheir associatedoperations.Thegeneralsupplychainmodel, accompanied by a formal tensorial representation is supplied in Section 4, sub- sequently put into practice in Sections 5 and 6, where we provide the reader a method of analyzing RFID data within our framework. We benchmark our approach, with asymptotic complexity outlined in Section 7, with several test beds, and supply the results in Section 8. 2 Related Work Real-life scenarios are affected by great inconsistencies and errors in data, and face the problem of dealing with huge amounts of data, generated on a daily basis.Tothis aim, knowledgerepresentationtechniquesfocus onoperatingdeep analysisinthesesystems.Thereexiststwomainapproachestothe management of RFID data: on one hand, we may process data streams at run-time [9–12], while on the other hand, processing is performed off-line, once RFID data are aggregated, compressed, and stored [6, 7]. If we consider RFID data as a stream, the main issues are event processing anddatacleaning.Wangetal.haveproposedaconceptualizationofRFIDevents basedonanextensionoftheERmodel[12].Baietal.havestudiedthelimitations ofusingSQLtodetecttemporaleventsandhavepresentedanSQL-likelanguage to query such events in an efficient way [9]. Inaccuracies of RFID tags readings cause irregularitiesin the processingof data: therefore cleaning techniques need to be applied to RFID data streams as proposed in [10]. An alternative approachconsists in warehousing RFID data, and performing multidimensional analyses on the warehouse. The focus here is on data com- pression techniques and on storage models, with the goal of achieving a more expressiveandeffective representationofRFIDdata. Astraightforwardmethod is to provide a support to path queries—e.g., find the averagetime for products