ebook img

PRSP: A Plugin-based Framework for RDF Stream Processing PDF

0.46 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview PRSP: A Plugin-based Framework for RDF Stream Processing

PRSP: A Plugin-based Framework for RDF Stream Processing Qiong Li1,3 Xiaowang Zhang1,3 Zhiyong Feng2,3 1SchoolofComputerScienceandTechnology,TianjinUniversity,Tianjin300350,P.R.China 2SchoolofComputerSoftware,TianjinUniversity,Tianjin300350,P.R.China 3TianjinKeyLaboratoryofCognitiveComputingandApplication,Tianjin300350,P.R.China ABSTRACT venientwayandcomparetheirperformanceunderaunifiedframe- 7 worknamelyPRSP.Anduserscanchoosethefavourablesystems 1 Inthispaper,weproposeaplugin-basedframeworkforRDFstream basedontheirallkindsofrequirements.Forexample,theyhavethe 0 processing named PRSP. Within this framework, we can employ needtohandlelarge-scaleRDFgraphs,thusdistributedenginesare 2 SPARQLqueryenginestoprocessC-SPARQLquerieswithmain- thebestchoice. tainingthehighperformanceofthoseenginesinasimpleway.Tak- n ingadvantageofPRSP,wecanprocesslarge-scaleRDFstreamsin 2. PRELIMINARIES a J adistributedcontextviadistributedSPARQLengines.Besides,we RDFstreamAnRDFstreamisdefinedasorderedsequencesof canevaluatetheperformanceandcorrectnessofexistingSPARQL 4 pairs,eachpairbeingmadeofanRDFtripleandatimestampT: query engines in handling RDF streams in a united way, which 1 amendstheevaluationofthemrangingfromstaticRDF(i.e.,RDF ((cid:104)Si,Pi,Oi(cid:105),T) graph)todynamicRDF(i.e.,RDFstream). Finally,withinPRSP, ] C-SPARQL query The continuous query is divided into three B weexperimentlyevaluatethecorrectnessandtheperformanceon parts:R ,S(t),Q ,anditisformallydefinedasfollows: D YABench. TheexperimentsshowthatPRSPcanstillmaintainthe query Q=[SRPARQ,LS(t),Q ] high performance of those engines in RDF stream processing al- query SPARQL s. thoughtherearesomeslightdifferencesamongthem. • Rqueryindicatestheregisteredqueryfromuserswhichiswait- ingtobeaddressed. c [ Keywords • S(t)istheRDFstreamregisteredbytheRSPsystems,which definesthewindowsizeandstepsize. 1 RDFStream;RSP;SPARQL;C-SPARQL • Q isastandardRDFquerylanguage,i.e.,SPARQL. SPARQL v 1. INTRODUCTION PROPOSITION 1. LetQbeaC-SPARQLquery. ForanyRDF 4 RDFstream,asanewtypeofdataset,canmodelreal-timeand streamSandanypresenttimet,thefollowingholds: 5 8 continuousinformationinawiderangeofapplications,e.g. envi- Q = Q . 3 ronmental monitoring, Smart City and so on. But data stream is (cid:74) (cid:75)(S,t) (cid:74) SPARQL(cid:75)Window(S,t) Proposition1ensuresthattheevaluationproblemofC-SPARQL 0 unboundedsequencesoftime-varyingdataelementanddifficultto queriesoverRDFstreamscanbeequivalenttotheevaluationprob- . store. 1 lemofSPARQLqueriesoverRDFgraphs. Moreover,Proposition Whatismore, thereisafewRSP[1](RDFStreamProcessing) 0 1canshowthattheevaluationproblemofC-SPARQLhasthesame systems,suchasC-SPARQL[2]andEP-SPARQL[3]implemented 7 computationalcomplexityasSPARQL[2]. for supporting RDF stream due to its complicacy in processing. 1 Considerthefollowingquery,asimpleexamplefromC-SPARQL. Ontheotherhand,therearemanypopularandefficientSRARQL : Line1matchingtheR ,tellstheRSPsystemtoregisterthecon- v queryenginessupportingonlystaticRDFgraphs,suchasthecen- query tinuousqueryofTestQuery. S(t),thatisthefollowinglistofline i tralizedengines,Jena[4],RDF-3XandgStore,anddistributedsys- X 3, indicatesthatstreamswithaslidingwindowof5secondsthat tems,TriAD,gStoreD[5]andsoon.HowtoemploythoseSRARQL r queryenginestoevaluatecontinuousqueriesbecomesaninterest- slidesevery5seconds,isthestreamdatawaitingtobeprocessed. a AndQ ,displayedinbothline2andline4,isthequerylan- ingproblem. sparql guageforRDF. In this paper, we provide a plugin-based framework for RDF streamprocessingnamedPRSP,whichmakesitpossibletousethe 1.REGISTERQUERYTestQueryAS high-performanceRDFenginesthat’svalidonlyforRDFgraphs, 2.SELECT?obs toprocessRDFstreams.Moreover,withinthisframework,wecan 3.FROMSTREAMstreams[RANGE5s STEP5s] employanyRDFqueryenginetoprocessRDFstreamsinacon- 4.WHERE{?obsobservedPropertyAirTemperature.} Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor 3. THEARCHITECTUREOFPRSP classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributed forprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcita- PRSPisanextensionofSPARQLforqueryingbothRDFgraphs tiononthefirstpage. Copyrightsforcomponentsofthisworkownedbyothersthan and RDF streams shown in Figure1. Both continuous query and ACMmustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,orre- RDFstreamsastheinputofPRSParetransformedbytheplugin publish,topostonserversortoredistributetolists,requirespriorspecificpermission and/[email protected]. QueryRewritingandDataTransformerinPRSP,respectively. Af- XXXXXX terthat,theoutputfromtheformerpluginsastheinputofSPARQL API,theresultsareproducedbyoneofSPARQLengines.Andthe (cid:13)c 2016ACM.ISBN0-12345-67-8/90/01...$15.00 DOI:http://dx.doi.org/10.1145/0000000.0000000 rightboxconsistsofanySPARQLqueryenginewhichisusedas RDF Result 104 104 Stream RDF Centralized Data Transformer SPARQL Engines 103 103 Window ms] ms] SelectorSPARQL API DIstributed Result Time[102 Time[102 Query Rewriting Query Engines Continuous Query Plugin in RSP 101 101 100 200 300 400 500 100 200 300 400 500 Figure1:PRSPArchitecture Thenumberofsensors Thenumberofsensors Jena RDF3X gStore gStoreD TriAD Jena RDF3X gStore gStoreD TriAD a black box for evaluating RDF graphs. Its architecture contains (a) TheresponsetimeofQ1 (b) TheresponsetimeofQ1’ three types of plugin: Data Transformer, Query Rewriting, and a SPARQLAPIconnectingwiththeformertwoplugins. Figure2:QueryingtimeindifferentscenarioswithinPRSP QueryRewriting. Continuousqueriesastheinputofqueryrewritingmode,apply 104 104 transformationmethodsinordertogeneratetwotypesofqueries, namely, SPARQL query and window operator, which can be ad- ms) 103 ms) 103 dressedinoneofSPARQLengineandDataTransformermodule, me( me( Ti Ti respectively.AfterrewrittingQ,wecanobtainQ . 102 SPARQL 102 DataTransformer. The data transformer module manages RDF streams specified 101 inthequeryviaEsperoranotherDSMS.AndittransformsRDF LT RT ET LT RT ET streamsintoRDFgraphsbasedonthewindowsizeandstepsize Jena RDF3X gStore gStoreD TriAD Jena RDF3X gStore gStoreD TriAD setbywindowoperator.AftertranformingSw.r.t.t,wecanobtain (a) s=100sensors (b) s=500sensors Windows(S,t). Figure3:RDFstreamforprocessingtimeinPRSP SPARQLAPI. PRSPdefinesaunifiedinterfaceforRDFengines,whichmakes Table1:Precision/Recallresults itpossibleandeasyforSPARQLenginestoprocessRDFstreams. Jena RDF3X gStore gStoreD TriAD InthecurrentversionofPRSP,wehaveextendedPRSPbyinclud- s=100 99% 93% 100% 100% 97% ingafewcentralizedengines,suchasJena,gStore,andRDF-3X, Precision s=300 97% 94% 100% 100% 93% andtwodistributedengines,i.e.,gStoreDandTriAD. s=500 85% 88% 100% 100% 100% s=100 95% 89% 75% 72% 95% 4. EXPERIMENTSANDEVALUATIONS Recall s=300 94% 91% 88% 76% 92% s=500 92% 79% 77% 63% 91% Experiments. Allcentralizedexperimentswerecarriedoutonamachinerun- 5. CONCLUSIONS ning Linux, which has 4 CPUs with 6 cores and 64GB memory, Inthispaper,wepresentPRSP,asapluginadaptableforSPARQL and5machineswiththesameperformancefordistributedexper- engines, toprocessRDFstreams, whichmakesitfeasibletoem- iments. Forevaluation,weutilizedYABenchRSPbenchmark[6], ployvariousenginestoprocesslarge-scaleRDFstreams.Inthefu- which uses a real world dataset about water temperature. In our ture,wewilloptimizePRSPfurthertoimproveitsperformanceand experiments, we performed sliding windows with a window size correctness. ThisworkissupportedbytheNationalKeyResearch andastepsizeof5seconds,respectively. Consideringthatsome and Development Program of China (2016YFB1000603) and the enginescannotsupportcomplexqueries,theexperimentsusedtwo NationalNaturalScienceFoundationofChina(61672377). BGPqueries,Q1andQ1(cid:48).Q1isaBGPquerywithfourforms(Q1) 6. REFERENCES fromYABench,andQ1(cid:48) istherewritingofQ1withthreetriples. [1] http://www.w3.org/community/rsp/. SinceRDF-3Xdidnotworkwhenthetheamountofstreamdata to42.000triples(i.e.,s = 500),wechosefiveloadscenario(i.e., [2] Barbieri,D.F.,Braga,D.,Ceri,S.,DellaValle,E.,and s=100/200/300/400/500sensors). Grossniklaus,M.QueryingRDFstreamswithC-SPARQL. Evaluations. ACMSIGMODRecord,2010,39(1):20-26. The performance of each engine under the five different input [3] AnicicD,FodorP,RudolphS,etal.EP-SPARQL:aunified loads for windows is shown in Fig. 2. When the load ranges languageforeventprocessingandstreamreasoning.In: froms = 100tos = 500,thequeryresponsetimeiswithvary- Proc.ofWWW’11,pp.635–644. ingdegreesofincreaseexceptforgStoreD.Fig. 3showsthetime [4] http://jena.sourceforge.net/ARQ. ofthreeprocesses,includingdataloadtime(LT),queryresponse [5] P.Peng,L.Zou,MT.Zsu,L.Chen,andD.Zhou.Processing time(RT), and engine execution time (ET) under s = 300 ob- SPARQLqueriesoverdistributedRDFgraphs.VLDBJ., tained from query Q1(cid:48). LT from RDF3X, gStore, and gStoreD 2016,25(2):243–268. occupiesalargepartofET,resultingintheirlowerefficiencyfor [6] KolchinM,WetzP,KieslingE,etal.YABench:A processingRDFstreams.Table1illustratestheresultsofprecision ComprehensiveFrameworkforRDFStreamProcessor and recall from the experiments under three load scenarios (i.e., CorrectnessandPerformanceAssessment.In:Proc.of s=100/300/500)inPRSP.Alongwithmoreinputloadforwin- ICWE’16,pp.280-298. dows,mostofthemenjoylowerrecallswithhighaccuracy.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.