Table Of ContentLecture Notes in Computer Science 7720
CommencedPublicationin1973
FoundingandFormerSeriesEditors:
GerhardGoos,JurisHartmanis,andJanvanLeeuwen
EditorialBoard
DavidHutchison
LancasterUniversity,UK
TakeoKanade
CarnegieMellonUniversity,Pittsburgh,PA,USA
JosefKittler
UniversityofSurrey,Guildford,UK
JonM.Kleinberg
CornellUniversity,Ithaca,NY,USA
FriedemannMattern
ETHZurich,Switzerland
JohnC.Mitchell
StanfordUniversity,CA,USA
MoniNaor
WeizmannInstituteofScience,Rehovot,Israel
OscarNierstrasz
UniversityofBern,Switzerland
C.PanduRangan
IndianInstituteofTechnology,Madras,India
BernhardSteffen
TUDortmundUniversity,Germany
MadhuSudan
MicrosoftResearch,Cambridge,MA,USA
DemetriTerzopoulos
UniversityofCalifornia,LosAngeles,CA,USA
DougTygar
UniversityofCalifornia,Berkeley,CA,USA
MosheY.Vardi
RiceUniversity,Houston,TX,USA
GerhardWeikum
MaxPlanckInstituteforInformatics,Saarbruecken,Germany
Abdelkader Hameurlain Josef Küng
Roland Wagner (Eds.)
Transactions on
Large-Scale
Data- and Knowledge-
Centered Systems VII
1 3
Editors-in-Chief
AbdelkaderHameurlain
PaulSabatierUniversity,IRIT
118routedeNarbonne,31062ToulouseCedex,France
E-mail:hameur@irit.fr
JosefKüng
RolandWagner
UniversityofLinz,FAW
Altenbergerstraße69,4040Linz,Austria
E-mail:{jkueng,rrwagner}@faw.at
ISSN0302-9743(LNCS) e-ISSN1611-3349(LNCS)
ISSN1869-1994(TLDKS)
ISBN978-3-642-35331-4 e-ISBN978-3-642-35332-1
DOI10.1007/978-3-642-35332-1
SpringerHeidelbergDordrechtLondonNewYork
LibraryofCongressControlNumber:2012952682
CRSubjectClassification(1998):H.2.4,H.2.8,I.2,E.2,I.6.5,H.3.1,D.3.2-3,
G.3,E.1
©Springer-VerlagBerlinHeidelberg2012
Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis
concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting,
reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication
orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965,
inistcurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable
toprosecutionundertheGermanCopyrightLaw.
Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnotimply,
evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelaws
andregulationsandthereforefreeforgeneraluse.
Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India
Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
Preface
This volume is the second so-called regular volume of our journal. We find here
the resultof the review processcoveringallthe submissions thathave beensent
directlytothejournaladministrationsinceourlastregularvolumein2012.Rec-
ognized scientists checked the quality and gave helpful feedback to the authors
– many thanks to them from the editorial board.
Thisreviewprocessresultedintheselectionoffivecontributions.Theirrange
is from data management via data streams to service oriented computing, and
from an abstract algebraic framework via RDF and ontologies to a conceptual
model framework.
In the first contribution, ‘RFID Data Management and Analysis via Tensor
Calculus’, Roberto De Virgilio andFranco Milicchio meet the challengeof man-
aging the increasing amount of RFID data in supply chains. They introduce a
novel algebraic framework for modeling a supply chain and for efficiently per-
forming analysis. The main advantage of this approach is its theoretical sound-
ness. Tensor calculus provides the backgroundfor both modeling and querying.
Then,AbhirupChakrabortyandAjitSinghaddresstheissueofprocessingex-
actslidingwindowjoinsbetweendatastreamsinamemory-limitedenvironment
having burstiness in stream arrivals. ‘Processing Exact Results for Windowed
Stream Joins in a Memory-Limited System: A Disk-Based Approach’ providesa
frameworkandproposesanalgorithmto solvethatproblem.Disk storageanda
smart I/O strategy are used to meet the goals.
In the third article, we switch to semantic issues. In ‘Reducing the Semantic
Heterogeneity of Unstructured P2P Systems: A Contribution Based on a Dis-
semination Protocol’ Thomas Cerqueus, Sylvie Cazalens and Philippe Lamarre
address the situation that arises when, in a peer-to-peer system, different peers
are using different ontologies to annotate their data. First they propose a set of
measures to characterize different facets of heterogeneity. Then they introduce
a gossip-basedprotocol that allows some of them to be reduced.
‘Towards a Scalable Semantic Provenance Management System’, written by
Mohamed Amin Sakka and Bruno Defude, presents a new provenance manage-
mentsystem.Itallowsprovenancesourcestobeimportedandenrichedsemanti-
callyto obtainahigh-levelrepresentationofprovenance.Thepowerofsemantic
web technologies is used for heterogeneous, multiple source and decentralized
provenanceintegration.Sorichanswersaboutthewholedocumentlifecyclecan
be provided and trustworthiness can be increased.
Finally, Colin Atkinson, Philipp Bostan and Dirk Draheim concentrate on
service-oriented distributed systems. In ‘A Unified Conceptual Framework for
Service-Oriented Computing: Aligning Models of Architecture and Utilization’
they support various concepts and models and thereby make it possible to
customize and simplify each client developer’s view as well as the way in which
VI Preface
service providers develop and maintain their services. The paper provides a
unified conceptual framework for describing the components of service oriented
computing systems, in particular its foundations and concepts, and a small
example to show how the models are applied in practice.
Lastbut notleastwe wouldlike tothank GabrielaWagnerforsupportingus
with the organizationand we hope that you enjoy this TLDKS volume.
October 2012 Abdelkader Hameurlein
Josef Ku¨ng
Roland Wagner
Editorial Board
Reza Akbarinia INRIA, France
St´ephane Bressan National University of Singapore, Singapore
Francesco Buccafurri Universit`a Mediterranea di Reggio Calabria,Italy
Qiming Chen HP-Lab, USA
Tommaso Di Noia Politecnico di Bari, Italy
Dirk Draheim University of Innsbruck, Austria
Johann Eder Alpen Adria University Klagenfurt, Austria
Stefan Fenz Vienna University of Technology, Austria
Georg Gottlob Oxford University, UK
Anastasios Gounaris Aristotle University of Thessaloniki,Greece
Theo H¨arder Technical University of Kaiserslautern,Germany
Dieter Kranzlmu¨ller Ludwig-Maximilians-Universit¨atMu¨nchen, Germany
Philippe Lamarre University of Nantes, France
Lenka Lhotsk´a Technical University of Prague, Czech Republic
Vladimir Marik Technical University of Prague, Czech Republic
Dennis McLeod University of Southern California, USA
Mukesh Mohania IBM India, India
Tetsuya Murai Hokkaido University, Japan
Gultekin Ozsoyoglu Case Western Reserve University, USA
Oscar Pastor Polytechnic University of Valencia, Spain
Torben Bach Pedersen Aalborg University, Denmark
Gu¨nther Pernul University of Regensburg, Germany
Klaus-Dieter Schewe University of Linz, Austria
Makoto Takizawa Seikei University Tokyo, Japan
David Taniar Monash University, Australia
A Min Tjoa Vienna University of Technology, Austria
Table of Contents
RFID Data Management and Analysis via Tensor Calculus............ 1
Roberto De Virgilio and Franco Milicchio
Processing Exact Results for Windowed Stream Joins
in a Memory-Limited System: A Disk-Based, Adaptive Approach ...... 31
Abhirup Chakraborty and Ajit Singh
Reducing the Semantic Heterogeneity of Unstructured P2P Systems:
A Contribution Based on a Dissemination Protocol................... 62
Thomas Cerqueus, Sylvie Cazalens, and Philippe Lamarre
Towards a Scalable Semantic Provenance Management System......... 96
Mohamed Amin Sakka and Bruno Defude
A Unified Conceptual Framework for Service-Oriented Computing:
Aligning Models of Architecture and Utilization...................... 128
Colin Atkinson, Philipp Bostan, and Dirk Draheim
Author Index.................................................. 171
RFID Data Management and Analysis
(cid:2)
via Tensor Calculus
Roberto De Virgilio and Franco Milicchio
Dipartimento diInformatica e Automazione
Universit´aRoma Tre, Rome, Italy
{dvr,milicchio}@dia.uniroma3.it
Abstract. Traditional Radio-Frequency IDentication (RFID) applica-
tions have been focused on replacing bar codes in supply chain man-
agement. The importance of such new resource soared in recent years,
mainly due to the retailers’ need of governing supply chains. However,
duetothemassiveamountofRFID-relatedinformation insupplychain
management,attainingsatisfactoryperformancesinanalyzingsuchdata
sets is a challenging issue. Popular approaches providehard-coded solu-
tions, with high consumption of resources; moreover, these exhibit very
inadequateadaptability when dealing with multidimensional queries, at
various levels of granularity and complexity.
Inthispaperweproposeanovelmodelforsupplychainmanagement,
aimingatgenerality,correctness,andsimplicity.Suchmodelisbasedon
thefirstprinciplesofmultilinearalgebra,specifically,oftensorialcalculus.
Leveraging our abstract algebraic framework, we envision a system
allowing bothquickdecentralized on-lineitem discovery andcentralized
off-line massive business logic analysis, according to needs and require-
ments of supply chain actors. Being our computations based on vecto-
rial calculus, we are able to exploit theunderlyinghardware processors,
achieving a huge performance boost, as the experimental results show.
Moreover, by storing only the needed data, and benefiting from linear
properties, we are able to carry out the required computations even in
high memory constrained environments, such as on mobile devices, and
inparallelanddistributedtechnologiesbysubdividingourtensorobjects
into sub-blocks,and processing them independently.
1 Introduction
A supply chain is a complex system for transferring products or services from a
supplier to a final customer. Aiming at improving the exchange of information,
retailers are investing in new technologies for managing supply chains: to this
aim, RFID, the Radio-Frequency Identification, is a recent potential wireless
technology, which enables a direct link of a product with a virtual one, within
(cid:2) ThisworkhasbeenpartiallyfundedbytheItalianMinistryofResearch,grantnum-
ber RBIP06BZW8, FIRB project “Advanced tracking system in intermodal freight
transportation”.
A.Hameurlainetal.(Eds.):TLDKSVII,LNCS7720,pp.1–30,2012.
(cid:2)c Springer-VerlagBerlinHeidelberg2012
2 R. DeVirgilio and F. Milicchio
information systems [1, 2]. Generally, RFID tags attached to products, store an
identification code named EPC—which stands for Electronic Product Code [3],
acoding scheme forRFID tags,aimedatuniquely identify them—used asa key
to retrieve relevant properties of an object from a database, usually within a
networked infrastructure.
Usually,anRFIDapplicationgeneratesasetoftuples,usuallycalledrawdata,
oftheformofatriple(e,l,t),whereeisanEPC,l representsthelocationwhere
anRFIDreaderhasscannedtheobjecte,andtisthetimewhenthereadingtook
place; while these are fundamental properties, others can be retrieved as well,
e.g., temperature, pressure, and humidity. Tags may have multiple subsequent
readings at the same location, potentially generating vast amounts of raw data.
Consequently, a simple data cleaning technique consists in converting raw data
instay records, i.e., tuples structuredas(e,l,t ,t ), wheret andt arethe time
i o i o
when an object enters or leaves a location l, respectively.
In this manuscript, we address the challenging problem of efficiently man-
aging the tera-scale amount of data per day, generated by RFID applications
(cf. [4], [5], [6], and [7]), focusing on stay records as the basic block to store
RFID data. We advocate the use of a model basedonthe firstprinciples of ten-
sorial algebra, achieving a great deal of flexibility, while maintaining simplicity
and correctness.
Recently, two models have been proposed: a herd and a single model. The
former[6]proposesawarehousingmodelobservingthat,usually,productsmove
together in large groups at the beginning, and continue in small herds towards
the end of the chain: this allows the aggregation and reduces significantly the
size of the database. On the contrary, the latter [7] focuses on the movement of
single, non grouped tags, defining query templates and path encoding schemes
to process tracking and path oriented queries efficiently.
Both of these approaches present limited flexibility dealing multidimensional
queries at varying levels of granularity and complexity. To exploit the compres-
sionmechanism,suchproposalshavetofixthedimensionsofanalysisinadvance
and implement ad-hoc data structures to be maintained. It is not known in ad-
vance whether objects are sorted and grouped, and therefore, it is problematic
to support different kinds of high-level queries efficiently.
In this paper we propose a novel approachbased on multilinear algebra. Ma-
trixoperationsareinvaluabletoolsinseveralfields,fromengineeringanddesign,
graphtheory,ornetworking;inparticularstreamsandco-evolvingsequencescan
also be envisioned as matrices: each data source, a sensor, may correspond to a
row, while time-ticks to a column (cf. [8]).
Standard matrix approaches focus on a standard two-dimension space, while
we extend the applicability of such techniques with the more general definition
of tensors, a generalization of linear forms, which are usually represented by
matrices.We maythereforetakeadvantageofthe vastliterature,boththeoretic
and applied, regarding tensor calculus. Computational algebra is employed in
critical applications, such as engineering and physics, and several libraries have
been developed with two major goals: efficiency and accuracy.
RFIDData Management and Analysis via Tensor Calculus 3
Contribution. Leveraging such background, this paper proposes a general
model of supply chains, mirrored with a formal tensor representation and en-
dowedwithspecificoperators,allowingbothquickdecentralizedon-lineprocess-
ing, and centralized off-line massive business logic analysis, according to needs
andrequirementsofsupplychainactors.Ourmodelandoperations,inheritedby
linear algebra and tensor calculus, is therefore theoretically sound, and its im-
plementation benefits from numerical libraries developed in the past, exploiting
the underlying hardware, optimized for mathematical operations. Additionally,
due to the properties of our tensorial model, we are able to attain two significa-
tive features:the possibility ofconducting computationsinmemory-constrained
environments such as on mobile devices, and exploiting modern parallel and
distributedtechnologies,owingtothepossibilityformatrices,andthereforeten-
sors,to be dissectedinto severalchunks, andprocessedindependently (i.e., also
on-the-fly).
Outline. Our manuscript is organized as follows. In section 2 we will briefly
recall the available literature, while Section 3 will be devoted to the introduc-
tionoftensorsandtheir associatedoperations.Thegeneralsupplychainmodel,
accompanied by a formal tensorial representation is supplied in Section 4, sub-
sequently put into practice in Sections 5 and 6, where we provide the reader
a method of analyzing RFID data within our framework. We benchmark our
approach, with asymptotic complexity outlined in Section 7, with several test
beds, and supply the results in Section 8.
2 Related Work
Real-life scenarios are affected by great inconsistencies and errors in data, and
face the problem of dealing with huge amounts of data, generated on a daily
basis.Tothis aim, knowledgerepresentationtechniquesfocus onoperatingdeep
analysisinthesesystems.Thereexiststwomainapproachestothe management
of RFID data: on one hand, we may process data streams at run-time [9–12],
while on the other hand, processing is performed off-line, once RFID data are
aggregated, compressed, and stored [6, 7].
If we consider RFID data as a stream, the main issues are event processing
anddatacleaning.Wangetal.haveproposedaconceptualizationofRFIDevents
basedonanextensionoftheERmodel[12].Baietal.havestudiedthelimitations
ofusingSQLtodetecttemporaleventsandhavepresentedanSQL-likelanguage
to query such events in an efficient way [9]. Inaccuracies of RFID tags readings
cause irregularitiesin the processingof data: therefore cleaning techniques need
to be applied to RFID data streams as proposed in [10].
An alternative approachconsists in warehousing RFID data, and performing
multidimensional analyses on the warehouse. The focus here is on data com-
pression techniques and on storage models, with the goal of achieving a more
expressiveandeffective representationofRFIDdata. Astraightforwardmethod
is to provide a support to path queries—e.g., find the averagetime for products