ebook img

Querying Business Processes PDF

12 Pages·2006·0.33 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Querying Business Processes

Querying Business Processes ∗ Catriel Beeri Anat Eyal Simon Kamenkovich The Hebrew University Tel Aviv University Tel Aviv University [email protected] [email protected] [email protected] Tova Milo Tel Aviv University [email protected] ABSTRACT view of the process, as a graph of data and activity nodes, con- nectedbycontrolanddataflowedges. Designsareautomatically WepresentinthispaperBP-QL,anovelquerylanguageforquery- convertedtoBPELspecifications.Thesecanbeautomaticallycom- ingbusinessprocesses. TheBP-QLlanguageisbasedonanintu- piledintoexecutablecodethatimplementsthedescribedBP[30]. itivemodelofbusinessprocesses, anabstractionoftheemerging DeclarativeBPELspecificationsgreatlysimplifythetaskofsoft- BPEL(BusinessProcessExecutionLanguage)standard. Itallows waredevelopmentforBPs.Moreinterestinglyfromaninformation userstoquerybusinessprocessesvisually,inamannerveryanal- managementperspective,theyalsoprovideanimportantnewmine ogous to how such processes are typically specified, and can be ofinformation.Considerforinstanceauserwhotriestounderstand employedinadistributedsetting,whereprocesscomponentsmay howaparticulartravelagencyoperates. Shemaywanttofindan- beprovidedbydistinctproviders(peers). swerstoquestionssuchas: CanIgetapricequotewithoutgiving We describe here the query language as well as its underlying firstmycreditcarddetails? Whatshouldonedotoconfirmapur- formalmodel. Weconsiderthepropertiesofthevariouslanguage chase?Whatkindofcreditservicesareusedbytheagency,directly componentsandexplainhowtheyinfluencedthelanguagedesign. orindirectly,(i.e. bytheotherprocessesitinteractswith)? Ob- In particular we distinguish features that can be efficiently sup- viously,suchqueriesareofgreatinteresttobothindividualusers ported,andthosethatincuraprohibitivelyhighcost,orcannotbe andtoorganizationsinterestedinusingoranalyzingBPs.Answer- computedatall. Wealsopresentourimplementationwhichcom- ingthemisextremelyhard(ifnotimpossible)whentheBPlogic plies with real life standards for business process specifications, iscodedinacomplexprogram. Itispotentiallymucheasiergiven XML,andWebservices,andisusedintheBP-QLsystem. adeclarativespecificationlikeBPEL.Foranorganizationthathas accesstoitsownBPELspecifications,aswelltothoseofcooperat- 1. INTRODUCTION ingorganizations,theabilitytoanswersuchqueries,inapossibly A business process (BP for short) consists of a group of busi- distributedenvironment,isofgreatpracticalpotential. nessactivitiesundertakenbyoneormoreorganizationsinpursuit Tosupportsuchqueries,oneneedsanadequatequerylanguage, ofsomeparticulargoal. Itusuallydependsuponvariousbusiness andanefficientexecutionengineforit. Toaddressthisneed, we functions for support, e.g. personnel, accounting, inventory, and presentinthispaperBP-QL,anewquerylanguagewhichallows interactswithotherBPs/activitiescarriedbythesameorotheror- for an intuitive formulation of queries on BP specifications, and ganizations. Consequently, the software implementing such BPs queryexecutioninadistributedcross-organizationenvironment. typicallyoperatesinacross-organization,distributedenvironment. Before presenting our results, let us highlight briefly some of It is common practice to use XML for data exchange between thechallengesinqueryingBPspecificationsingeneral,andBPEL BPs,andWebservicesforinteractionwithremoteprocesses[34]. onesinparticular. TherecentBPELstandard(BusinessProcessExecutionLanguage FlexiblegranularityBPspecificationsmaybeabstractlyviewed [7], alsoidentifiedasBPELWSorBPEL4WS),developedjointly as a set of nested graphs, possibly with recursion: The graphs byBEASystems,IBM,andMicrosoft,combinesandreplacesIBM’s structure captures the execution flow of the process components; WebServicesFlowLanguage(WSFL)[27]andMicrosoft’sXLANG Thenestingcomesfromthefactthattheoperations/servicesused [35]. ItprovidesanXML-basedlanguagetodescribenotonlythe in a process are not necessarily atomic and may have a complex interfacebetweentheparticipantsinaprocess,butalsothefullop- internal structure (which may itself be represented by a graph); erationallogicoftheprocessanditsexecutionflow. The recursion is due to the fact that a process may call itself in- Commercial vendors offer systems that allow to design BPEL directly,throughcallsitmakestootherprocesses.Usersmaywish specification via a visual interface, using a conceptual, intuitive to ask coarse-grain queries that consider certain process compo- ∗TheresearchhasbeensupportedbytheIsraelScienceFoundation. nentsasblackboxesandallowforhighlevelabstraction,aswellas fine-grainedqueriesthat“zoom-in”onalltheprocesscomponents, possiblyrecursively. Anadequatequerylanguagemustthusallow Permissiontocopywithoutfeeallorpartofthismaterialisgrantedprovided userstoquerytheprocessesatdifferent,flexible,granularitylevels. thatthecopiesarenotmadeordistributedfordirectcommercialadvantage, DistributionAsmentionedabove,BPstypicallyoperatesinacross- theVLDBcopyrightnoticeandthetitleofthepublicationanditsdateappear, andnoticeisgiventhatcopyingisbypermissionoftheVeryLargeData organization,distributedenvironmentwhereeachpeerholdsaset BaseEndowment. Tocopyotherwise,ortorepublish,topostonservers ofBPsandmayprovide(resp.use)servicesto(of)remotepeers.If ortoredistributetolists,requiresafeeand/orspecialpermissionfromthe aservice’sinternalflowhasbeendefinedinBPEL,andtheservice publisher,ACM. providersmakethisspecificationavailabletotheircooperatingor- VLDB’06,September12-15,2006,Seoul,Korea. ganizations(sayviaawebservice),usersmaywishtozoom-inon Copyright2006VLDBEndowment,ACM1-59593-385-9/06/09 theseremotecomponentsaswelltoquerytheservicespecification. Observe that, in general, several modes of querying business processes are possible. One can query the specifications as data PathsextractionWhenqueryingBPs,usersmaybeinterestedin (e.g. “doesthespecificationincludeapathfromactivityAtoac- retrieving,asananswer,thequalifyingflowpaths(asforinstance tivityB”). Onecanalsoaskaboutpatternsthatmayoccurwhen inthequery“WhatshouldIdotoconfirmmypurchase?”). Asthe theprocessesareexecuted(e.g. “cantherebearunofthesystem numberofrelevantpathsmaybelarge(oreveninfiniteinrecursive whereactivityAisfollowedbyactivityB”).Onecanalsomonitor processes)amajorchallengeistoprovidestheuserswithacom- runsasprocessesexecute,orposequeriesonlogsofpastruns. pactfiniterepresentationofthe(possibleinfinite)answer. BP-QLisaquerylanguageforprocessspecifications,1notabout EaseofqueryingAsmentionedabove,theBPELstandardoffers theirpossibleruns. Thisisfortwomainreasons. First,querying anXML-basedlanguagefordescribingtheoperationallogicofa thepossiblerunsofasystemisaverificationproblem[22]andis BP.SinceaBPELspecificationisessentiallyanXMLdocument,a typically of very high complexity (from NP-hard for very simple naturalquestioniswhynotqueryitdirectly,usingXQuery?Akey specificationstoundecidableinthegeneralcase[28]).Second,the observationisthattheBPELXMLformatis(1)verycomplexand analysisofrunsrequiresaspecificationtohaveawelldefinedse- (2)wasdesignedwitheaseofautomaticcodegenerationinmind; mantics.Unfortunately,BPELisnotbasedonaformalmodel[28]. however,itisextremelyinconvenientforquerying.Toexpresseven To avoid these obstacles and guaranty complexity that is polyno- averysimpleinquiryaboutaprocessexecutionflow,oneneedsto mialinthesizeofthedata,BP-QLignorestherun-timesemantics write a fairly complex XQuery query that performs an excessive ofcertainBPELconstructssuchasconditionalexecutionandvari- numberofjoins.Furthermore,evenifamorequery-friendlyXML ablevaluesandfocusesonthegivenspecificationflow.Webelieve representationforithadbeenchosen(asindeedisdoneinternallyin this approach offers a reasonable balance between expressibility ourimplementation),XQuery,asis,wouldstillnotbeadequatefor and complexity. Note that querying of specifications in fact “ap- thetask: XQueryonlyreturnsdocument elements, butnotpaths, proximates”thequeryingofruns(e.g.onlyspecificationsthatcon- itdoesnotsupportqueryingatdifferentlevelsofgranularity,and taintwogivenactivitiesmaypotentiallyhaverunswherebothoc- it does not offer tools for controlling distributed querying. Last cur).Hence,evenwhenfullrunverificationisdesired,BP-QLcan butnotleast,queryinganXMLrepresentationismuchmorediffi- be used as an efficient means to narrow the search space for the cultthanqueryingdirectlyaconceptualmodel.Essentially,easeof morecostly, interpretationdependent, verification. Itcan alsobe queryingrequiresananintuitive,conceptual,datamodel,coupled usedtoselecttheprocesspartstobemonitoredatruntime[32]. withamatching,equallyintuitive,querylanguage. ContributionsWenowstatethecontributionsofthispaper. The BP-QL query language presented in this paper addresses 1. WepresentBP-QL,anewgraphicalquerylanguagethatal- theseissues. ItisbasedonanintuitivemodelofBPs,anabstrac- lowsforintuitivequeryingofprocessspecifications, byof- tionoftheBPELspecification,alongwithagraphicaluserinterface feringadatamodelandaninterfacesimilartothoseusedfor thatallowsforsimpleformulationofqueriesoverthismodel. Ina BPsspecification. Itallowstoretrievepaths,andoffersfa- sense,itfollowsthesamedesignprinciplesthatguidedcommercial cilitiesforqueryingatdifferentlevelsofgranularity,andfor vendorsinthedevelopmentofgraphicaleditorsforthespecifica- controllingdistributedquerying. tionofBPELprocesses: ithidesfromtheusersthetediousBPEL 2. Wepresentaformalmodelforsystemsofprocesses,andfor XML details and allows for more natural query formulation. In- ourquerylanguageonsuchsystems,basedongraphgram- deed,wewillseethatthetightanalogybetweenhowBPsarespec- mars[21]. Thismodelallowstodistinguishbetweenquery ifiedinsucheditorsandhowtheyaregraphicallyqueriedinBP- featuresthatcanbeefficientlysupported,andthosethatin- QL,facilitatesintuitivequerying. BP-QLalsooffersfacilitiesfor curaprohibitivelyhighcost, orcannotbecomputedatall. controlling granularity and distribution in query formulation and Using this model, we explain how to construct a finite and allowspathsinqueryresults. intuitiverepresentationofthe(possiblyinfinite)answersof At the core of the BP-QL language are BP patterns that allow queriesintimepolynomialinthesizeofthespecifications. userstodescribethepatternofactivities/dataflowthatareofinter- 3. Finally,wedescribethesystem’simplementation,highlight- est. BPpatternsaresimilartothetree-andgraph-patternsoffered ingthemainchallengesfacedandthesolutionstaken. byexistingquerylanguagesforXML[36]andgraph-shapeddata AfirstprototypeofBP-QLwasdemonstratedin[5],whereonlya [15,13,31],butincludetwonovelfeaturesdesignedtoaddressthe veryhighlevelviewofthelanguagewaspresented. Thepresent issues mentioned above. First, BP-QL supports navigation along paperprovidesacomprehensivedescriptionofthelanguage,ofits twoaxis: (1)thestandardpath-basedaxis,thatallowstonavigate underlyingformalformalandofitsimplementation. Thepaperis through,andquery,pathsinprocessgraphs,and(2)anovelzoom- organizedasfollows. Section2introducesBP-QL informallyvia inaxis,thatallowstonavigate(transitively)insideprocesscompo- arunningexample. Theunderlyingformalmodelispresentedin nents(localaswellasremoteones)andquerythematanydepth Section 3. The system implementation is described in Section 4. ofnesting. Second,pathsareconsideredfirstclassobjectsinBP- WeconcludeinSection5,consideringrelatedandfuturework. QL and can be retrieved, and represented compactly, even when involvingactivitiesperformedondistinctpeers. 2. SYSTEMOVERVIEW Together,thesefeaturesallowforsimpleformulationofqueries onBPs. However,theymaketheevaluationofqueriesmuchmore WepresenthereaninformaloverviewofBP-QLviaarunning intricatethanthatoftraditionalXML/graphpatterns.Indeed,some example. To illustrate the features of BP-QL , we will consider queries that can easily be evaluated on flat graphs/trees may be- a set of business processes (BPs) used by a consortium offering comecomputationallyexpensive(orevenundecidable)whennested travel-relatedservices. Theseincludeflightandhotelreservation, graphsareconcerned. Tokeeptheevaluationofqueriestractable, carrental,creditandaccountingservices. Theprocesses,andtheir we had identified these problematic scenarios and carefully de- BPEL specifications, reside and operate on distinct peers. The signedthelanguagesothattheyareavoided,andpolynomial-time specificationsincludetheinteractionsbetweenthevariousprocesses. queryevaluationisguaranteed. Ouranalysisisbasedonmodeling 1A variant for monitoring and querying of logs is planned future systemsofprocessesandqueriesasgraphgrammars[21]. research. Wefirstshowhowprocessesarespecified,viathesystem’sgraph- icaluserinterface,andthenillustratehowtheycanbeinterrogated andqueriedwithBP-QL.ThegraphicalspecificationofBPsthat weuseisfairlystandard,andissimilartothoseofferedbycommer- cialvendors(e.g.[30]).ThenoveltyhereisintheBP-QLgraphical query language, designed especially for querying such specifica- tions. The ease of query formulation is illustrated by comparing thequerygraphicalinterfacetothatusedfortheprocessesspecifi- cation;thereisatightanalogybetweenhowprocessesarespecified andhowtheyarequeried. RunningexampleOurrunningexampleisalongthelinesofW3C’s travelagentscenario[1].Alpha-Tours,afictionaltravelagency,of- fers to its potential clients the ability to book complete vacation packages: plane tickets, hotels, car rentals, and so on. The main stepsofthereservationprocessareasfollows: Theuserprovides adestination,somedates,possiblysomeconstraints,tothetravel agencyservice. Next,theserviceobtainsinformationaboutpossi- bledealsfromairlines,hotelsandcarrentalagenciesandpresents themtotheuser,whichselectstheonessheisinterestedin. Those are reserved by the agency. Finally, the user may cancel or con- Figure1:TravelAgency. firmthereservation,passinghercreditcarddetails.Theairline,car rentalandhotelservicescontactacreditcardserviceforpayment authorizationbeforetheyacknowledgesthereservation. Wenowdemonstratehowtheservicesarespecifiedandqueried. All screenshots were taken with our BP-QL visual designer and querytool. 2.1 BusinessProcesses AsystemconsistsofasetofBPs, possiblyresidingondistinct peers.ABPspecificationincludes: 1. Somegeneraldescriptionoftheprocessproperties,including itsname,capabilities,theserviceprovider,andsoon. 2. The data used in the process, namely the process variables andtheinputandoutputparametersfortheparticipatingac- tivities/services. 3. Theactivitiesofwhichtheprocessiscomposed. 4. Adescriptionoftheprocessoperationalanddataflow. Visually, the specification of a BP is represented as a directed labeled graph, with three types of nodes: property nodes (for 1), drawnashexagons; datanodes(for2),drawnasellipses; andac- tivitynodes(for3),drawnasrectangles. Edgesthatconnectdata Figure2:ZoomingintosearchTrip. andactivitynodes,calleddataflowedges,describewhichdatais read or output by which activity. Edges between activity nodes, taxonomiesandontologiesthatprovidestandarddefinitionsofthe calledactivityflowedges,describetheoperationalflow.Tocapture servicedomain.2 certain particular aspects of the operational flow of BPs, activity Theprocessflow(ontheleft)anditsdataelements(ontheright), nodesmaybeidentifiedasprovidedoperationsorrequestedoper- are displayed in separate boxes. The small rectangles at the top ations. These describe the services offered by a process to other andbottomoftheactivityflowareitsentryandexitpoints. The processes,andtheexternalservicesthatitrequests,resp. Activity BPcontainsfourcompoundactivitynodes,namelysearchTrip, nodesmayalsobedistinguishedasatomicorcompound.Thelatter reserveTrip,confirmTripandcancelTrip.Ashortthick representinvocationsofcomposite(possiblyremote)processesand incomingarrowindicatesaprovidedoperation. Aclientmayin- aredenotedbytwolittleboxesatthetopleftcorneroftheactivity vokeeachofthefourprovidedoperations(attheappropriatepoint icon. Theinterpretationofcompoundnodesisbasedontheideas intheflow). Edgesbetweendatanodesandactivitynodesdepict ofstatechart[23]:azoom-inallowstoreplaceacompoundactivity the data flow. For example, the client’s trip request is imported byadetaileddescriptionoftheprocessthatitinvokes. whenthesearchTripactivityisentered. Theresultsarestored Forillustration,considertheBPdepictedinFigure1. Itrepre- inatripResultvariable. Onecanzoomintoacompoundactivitynodetoseewhatisin- sentsthetravelagencyfromourrunningexample. Thelabelun- side. Figure 2 shows the details of searchTrip. We can see dereachnodeisitsname. Eachnodecarriessomeinformationon thatthetravelagencyinteractswithotherservicestofulfillclient the process property/data/activity that it represents, which can be requests. The short thick arrows outgoing of the searchCars, viewed by clicking on it. For instance, the property nodes at the top of the figure describe the process, its provider, and its capa- 2Implementation-wisetheyarestoredinaUDDIrepository. bilities. Most attributes of these nodes are references to external Figure4:Creditservicesinvokedwhensearchingfortrips. Figure3:Findprovidedoperations. searchFlights,andsearchRoomsiconsindicatethatthose arerequestedoperations. Here,thenodeattributes(notdisplayed in the figure) provide the parameters (URL, operation name, ...) that allow one to invoke the relevant Web service. If the service providersmaketheirBPELspecificationavailable,onecanzoom inalsointothesenodesaswelltoseetheservicespecification. Thefigurealsoshowsdataflowedges(forclaritysomeofthese edges are omitted). For example, the set of airlines that the agency works with is imported when searchFlights is en- tered. Theresultsofsearchingexternalairlineservicesarestored inpossibleFlights. Beforemovingontoquerying,wehighlighttwotypesofcycles Figure5:(a)Negation (b)Pathconstraints. thataspecificationmaycontain.First,thegraphofagivenBPmay containcycles,indicatingthatcertainactivitiesmayberepeatedan inselectionconditionsontheirattributesanddataandforjoins.We unboundednumberoftimes.Second,inasystemconsistingofsev- alsosupportnegation(denotedbydashednodesandedges). eralprocessesthatcalleachother,aBPmaycallitselfindirectly, We demonstrate the use of BP-QL via some example queries. throughcallsitmakestootherprocesses. Thisisanotherkindof Each query describes a process pattern that a user is looking for. Thecheckboxesnexttonodesandedgesmarkselectednodesand cyclicstructure: hereonecouldzoomintothecorrespondingcom- paths,resp.,thattheuserwantstotoretrieveasthequeryresult. poundoperationanunboundednumberoftimes. Notethatwhen querying BPs, users are often interested to retrieve flow paths as EXAMPLE 2.1. ThequeryinFigure3searchesforoperations answers(asforinstanceinthequery“Whatarethepossibleways providedbytheAlpha-ToursBPandtheservicesituses. The topurchaseaplaneticket?”).Inthepresenceofcycles,thenumber doubleheadededgesinsidethebehaviorboxindicatethatactivi- ofqualifyingpathsmaybeinfinite.Oneofthecontributionsofour tiesatanydistancefromthestart/endnodesmayqualify;theshape workistoprovideanintuitive,finite(andcompact)representation ofthenoderestrictsthesearchtoprovidedoperations. Thedou- forsuchpossiblyinfiniteanswers. bleboundingofthebehaviorboxdenotesunboundedzoom-in;we lookforoperationsprovidedbytheBPand(transitively)thecom- 2.2 TheBP-QLQueryLanguage poundactivities/servicesthatitinvokes. Thezoom-inisrestricted GiventhatBPsaredefineddeclaratively,wecanquerythespec- toactivities/serviceswhosespecificationsresideonthesamepeer, ificationstolearnabouttheprocesses. Inourrunningexample,a since the deepSearch attribute is set to local. Setting it to global user may want to ask questions like: ’Which operations are pro- willextendthesearchtoremoteservicesaswell. (cid:163) vided by the travel agency service?’, ’Which services are called directly or indirectly by the service?’, ’Does the service allow to EXAMPLE 2.2. Figure4illustratesajoinoperation.Thequery makeareservationwithoutfirstgivingcreditcarddetails? andif checkswhichVISAcreditcardservicesarecalled(directlyorindi- rectly)bytheAlphaTour’sconfirmTripactivity. Weusevari- so,whatdoesoneneedtodoformakingareservation?’. Wepro- ablestodefinethejoinconditions. Thejoinisvaluebased,i.e. the ceedtoexplainhowBP-QLcanbeusedtoexpresssuchqueries. nodes’attributesarecheckedtohavethesamevalues. (cid:163) BP-QLquerieslookmuchlikethespecifications. Forquerying BPs,BP-QLoffersBPpatternswhich,intuitively,playforBPsa roleanalogoustothatplayedforXMLtreesbytreepatternqueries. EXAMPLE 2.3. ThequeryinFigure5(a)illustratestheuseof Theydescribethepatternofactivity/dataflowthatisofinterestto negation. IttestswhethertheusersofAlpha Toursarenever theuserandallownavigationalongtwoaxis:path-basedandzoom- requiredtologinwhensearchingforflights. Formally, thisisex- inbased. Followingtheuseof/and//inXPath[36]fordenoting pressed by stating that a path to the searchFlights activity singleandmultiplestepnavigation,ourPBpatternsuseedgeswith thatpassesthroughaloginactivitydoesnotexists(dashededges singleanddoubleheadstodenotesingleandmultipleedgepaths, and nodes denote negation). The existing flow paths leading to resp.Similarly,toallowausertoqueryaboutflowsthatarenested searchFlights are then retrieved (as indicated by the small at any depth in the zoom-in hierarchy, compound activity nodes checkboxnexttothedoubleheadededge). mayhavedoublyboundedboxes,todenoteanunboundedzoomin A more lenient query, that retrieves, the paths without a login leading to searchFlights, can be expressed by attaching a intotheactivities’internalspecifications. Thenodesandedgesof variable, say x, to the edge, along with the selection condition BPpatternscanbeassociatedwithvariables,andthesecanbeused Figure6:Dataflow. x ∈ (Σ\“login”)∗. SeeFig.5(b). Regularpathexpressionsas constraintsonpathsarediscussedinSection3.4. (cid:163) Figure7:(a)searchFlights. (b)Infinitesetofresults. EXAMPLE 2.4. Finally, Figure 6 illustrates querying the data flow. Thequerysearchesfordataelementsthatare(transitively) affectedbythesearchRequest,andserveasinputforsending thesuggestedtripsbacktotheclient. Bydefault,adoubleheaded edgebetweentwodata(resp.activity)nodesdenotespathsconsist- ingonlyofdata(activity)flowedges. Tooverridethedefault,(e.g. considerpathswithallsortsofedges)onecanattach,asabove,a variabletotheedgewithanappropriateselectioncondition. (cid:163) 2.3 Query Semantics (informally) Whenaqueryisevaluated,itspatternsarematchedagainstthe systemBPs.Itsnodesandedgesareassignedactivity/data/property nodesandexecution/dataflowpaths,resp. Thesearethenusedto constructthequeryresult. More precisely, the semantics of a query q on system S is de- finedasfollows. Anembeddingisafunctionfromthenodesand Figure8:Finiteresultrepresentation. edges of q to nodes, edges and paths of S, that satisfies the ob- system’sdesign. Inparticularwedistinguishfeaturesthatcanbe vious constraints: Nodes are mapped to nodes of the same type, efficientlysupported,andthosethatincuraprohibitivelyhighcost, single/double-head edges are mapped to edges/paths between the orcannotbecomputedatall. Tosimplifythepresentationwefirst correspondingendpoints.Whenacompoundquerynodeisdoubly- considerabasicdatamodelandquerylanguage,thenenrichthem bounded,nodesandedgesinitmaybemappedtonodesandpaths toobtainthefullfledgedmodel. inaprocessobtainedbyanynumberofzoom-insintotheactivity’s specification. For nodes and edges are associated with variables, 3.1 SimpleBusinessProcessesandSystems thequeryconstraintsonthesevariablesmustbesatisfiedaswell. WeassumetheexistenceofoftwodomainsN ofnodesandL Eachembeddingdefinesoneresultforthequery.Thenumberof ofnodelabels.Listhedisjointunionofseveraldomainsincluding qualifyingresultsmaybelarge(possiblyinfiniteinthepresenceof datavalues,attributenames,dataelementnames,processproperty cycles). However,BP-QLprovidesaconcise,intuitive(andfinite) names,andatomicandcompoundactivitynames.Weassumesome representationfortheset.Weillustratethisbelowwithanexample distinguishedpropertynames. Theseareintroducedbelow,inthe andprovidemoredetailsontheconstructioninSection3. appropriatecontexts. EXAMPLE 2.5. AssumethatthesearchFlightsservice(in- vokedbysearchTripinFigure2)hasthestructuredepictedin Business graphs and processes. We model a (simple) BP Figure7(a). Theusercaneitherloginandcheckfortheavailabil- asadirectedlabeledgraphwithnodesoftwotypes: concreteand ity of various flights, or call, again, Alpha Tours’ searchTrip compound.Concretenodesrepresentprocessproperties,attributes, servicetostartanewsearch. Now,reconsiderthequeryinFigure data elements, and atomic activities. Compound nodes represent 5(b),thatretrievesthepathsleadingtosearchFlightsthatdo compoundactivities,namelycallsof(possiblyremote)operations. notrequirealogin. Becauseofthepotentialcyclicserviceinvoca- TwodistinguishednodesoftheBPgraphrepresentitsstartandend tion, searchTripcaninfactbereachedbyaninfinitenumber activities.Formally, ofpaths, asdepictedinFigure7(b). Ratherthanlistingallthese DEFINITION 3.1. A(simple)businessgraphisapairg=(G,λ), paths,theuserisprovidedwithacompactrepresentation(seeFig- whereG = (N,E)isadirectedgraphinwhichN ⊂ N isafi- ure8)thathighlightstherecursivestructureoftheresults. (cid:163) nitesetofnodes,andEisasetofedgeswithendpointsinN;and λ : N → L isalabelingfunctionforthenodes. Dependingon 3. THE FORMAL MODEL theirlabeltype,werefertothenodesingasactivitynodes,value nodes, property nodes, etc. Nodes labeled by compound activity In this section we briefly present the formal model underlying namesarecalledcompoundnodes;allothernodesingarecalled theBP-QLquerylanguage. Wediscussthepropertiesofthevar- concrete. ious language components and explain how these influenced our A(simple)businessprocess(BP)isatriplep=(g,start,end), where: g isabusinessgraph;start,endaretwodistinguished activity nodes in g; and each activity node in g resides on some pathfromstarttoend. (cid:163) Notethatthestartandendnodesneednotbedistinct.Forexample, aprocessmayconsistsofjustoneactivitynode,whichisbothits startanditsend.Alsonotethatonlyactivitynodesarerestrictedto bebetweenthestartandendnodes.Recallfromsection2thatactiv- itiescanbeclassifiedasrequestedorprovided.Thisismodeledby assumingtwoparticularpropertynamesprovidedandrequested, andattachingtoactivitynodesappropriatepropertynodes. Forexample,Figure9showsseveralBPs(ignorethe“bubbles” fornow).Asbefore,weusesquaresforactivitynodesandhexagons Figure9:AsystemofBPs. forpropertynodes. TheleftmostBPhasasinglecompoundactiv- itynode,whichisbothitsstartandend. Theoneinthecenterhas twodistinctstartandendnodes,andfourprovidedoperations. As mentioned above, compound nodes represent calls to com- positeoperations. Theinternalstructureoftheseoperationsisnot part of the business process graph and is given separately, as we explainnext. Simple systems. Asystemisacollectionofbusinessprocesses (orgraphs), alongwithamappingbetweencompoundnodesand theirimplementations–theprocessestheyinvoke. Inthegeneral case, a system may be distributed. This is ignored for now, for simplicity,andisdiscussedinSection3.4. DEFINITION 3.2. AsystemSofbusinessprocesses(resp.graphs) isapair(P,τ),wherePisafinitesetofbusinessprocesses(graphs), and τ is a (possibly partial) function, called the implementation function,fromthecompoundactivitynodesinPtobusinessprocesses (graphs)inP.3 (cid:163) Figure10:Arefinedsystem(afteronestep). Thisdefinitioncaneasilybeextendedtodistinguishbetweenroot processes,thataredirectlyaccessible,andimplementationprocesses, initsplace,withtheincoming/outgoingedgesofnnowbeingcon- thatareaccessibleonlyasimplementationsofotherprocesses. To nectedtothestart/endnodesofτ(n),resp. Ifnwasthestart/end simplifythepresentationweomitthishere. nodeofp,thestart/endnodeofτ(n)nowtakesthisrole.] Theimplementationfunctionispartialwhentheinternalstruc- Ifp→p →...→p wesaythatp isarefinementofp. 1 k k ture of some compound activities is unknown (for instance when WesaythatS →S(cid:48)(w.r.t.τ)ifS(cid:48)isobtainedfromSbyreplac- their providers do not wish to expose their specification). Recall ingtheimplementationpofsomecompoundactivitynodeninS fromdefinition3.1thattheonlydifferencebetweenbusinessgraphs byarefinementp(cid:48) ofp. [Namely,acopyofp(cid:48) isaddedtoP,the andbusinessprocessesisthatthelatterhavedistinguishedstartand mappingτ fornisupdatedtopointtoit,andτ isextendedtomap end nodes. Systems of processes are used to model real life ap- compoundnodesinp(cid:48),tothesameBPsasinP. Finally,ifpisno plications. Systems of graphs will prove useful to model query longertheimplementationofanynode,itisremovedfromP.] answers. For brevity, since we will mostly be dealing with sys- IfS →S →...→S wesaythatS isarefinementofS. (cid:163) 1 k k temsofprocesses,unlessstatedotherwisethetermsystemshould beinterpretedassystemofprocesses. Note that if S is a system, then each of its refinements is also Figure 9 shows a partial system. This is a partial description a system. Figure 10 shows a refinement of the system from Fig- oftheTravelAgencysystemfromFigures1,2(forsimplicity,the ure 9, after one refinement step, in which the implementation of dataandattributenodesareomitted). Thefullsystemshouldalso behaviorwasrefined:thenodelabeledsearchTriphasbeen contain,forexample,theprocessesoftheairline,carreservation, ”zoomedinto”andreplacedbyitsimplementingprocess. andhotelcompanies. 3.2 Simple Queries System Refinement. Given a system S, some BP p in it, and Wenowconsiderqueriesandtheiranswers. Forsimplicitywe acompoundactivitynodeninp,amoredetaileddescriptionofp consider first simple positive queries without negation and joins. (andhenceofS)canbeobtainedbyzooming-inandreplacingthe These,andotherextensions,areconsideredinSection3.4. nodenbyitsimplementation.Wecallthisarefinement. Queries. QueriesaremodeledusingBPpatterns. Thesegener- DEFINITION 3.3. GivenasystemS =(P,τ)andaBPpinP, alizeBPssimilarlytothewaytreepatternsgeneralizeXMLtrees. wesaythatp → p(cid:48) (w.r.t. τ)ifp(cid:48) isobtainedfrompbyreplacing Thelabelsofnodescanbespecified,orleftopenusing∗. Edges somecompoundactivitynode ninpbyitsimplementationτ(n). inagraphcanbeeithersingle-headed,inwhichcasetheyarein- [Namely,nisdeletedfromp,andacopyoftheBPτ(n)isplugged terpretedoveredges,ordouble-headed,inwhichcasetheyarein- 3Inanactualsystem,τ(n)canberepresentedbyattachingtonthe terpreted over paths. Similarly, nodes have a single or a double peerandprocessid(WebserviceURL,operationname, etc.) for boundary, for searching only in the direct implementation of the theimplementationofn. nodeorinallitsrefinements,resp.Wecalledgeswithdoublehead (resp.nodeswithdoubleboundary)transitiveedges(nodes). finitesetofgraphs,wheregraphsmaycontainnon-terminalsym- bols,andwheregrammarrulesallowtoreplaceanon-terminalby DEFINITION 3.4. ABPpatternisatuple(p∗,T,R),where agraphfromagivenfinitecollection. 1. p∗isaBPwherenodesarelabeledbyelementsfromL∪{∗}, Theintuitionisthat,forasystemS,theimplementationrelation- 2. T isadistinguishedsetofedgesandcompoundnodesinp∗ shipscorrespondtogrammarrules; thesystemrefinementscorre- calledthetransitiveedgesandnodes,resp. spondtothegraphlanguagedefinedbythegrammar. Similarly,a 3. Risadistinguishedsetofedgesandnodesinp∗ calledthe queryqcanalsobeviewedasCFGGwhosegraphlanguageconsists resultedgesandnodes,resp. ofallthegraphsthatsatisfythequeryconstraints(i.e. containthe Asimplequeryq isasystemofBPpatterns(Q,τ),whereQisa patternsspecifiedbythequery). Thisintuitioncanbeextendedto setofBPpatterns,andτ isanimplementationfunction. (cid:163) thequeryanswer:Insteadofconstructingexplicitlythepotentially infinite set of results, one may construct a CFGG that represents Toevaluateaquery,itspatternsarematchedtothoseof(refine- it. Specifically, thequeryanswercanbeviewedassomekindof mentsof)thesystem.Amatchiscalledanembedding. “intersection” of the languages defined by the system and query grammars(followedbya“projection”thatomitstheportionsthat DEFINITION 3.5. Letq = (Q,τ)beasimplequeryandletS werenotrequestedasoutput). beasimplesystem. AnembeddingofqintoSisahomomorphism Ingeneral,theintersectionoftwoCFGGlanguagesmaynotbea ρfromthenodesandedgesinqtonodesedgesandpathsinsome refinementS(cid:48) =(P(cid:48),τ(cid:48))ofSs.t. CFGGlanguage[21].(Thisgeneralizesthesamepropertyforstring CFGs.) Inourparticularcase, however, thequeryspecificationis 1. (nodes)eachstart(resp.end)nodeinqismappedtoastart sufficientlysimpletoguaranteetherequiredclosure:onecanshow (resp. end)nodeinS(cid:48); eachconcretenodeinq ismapped toaconcretenodeinS(cid:48)ofthesamekind;andanodewitha that it belongs to a restricted class of CFGGs called recognizable sets[14],forwhichtheintersectionwithanotherCFGGisknownto constantlabelismappedtoanodehavingthesamelabel. yieldaCFGG. Thisimpliesthatinprincipleonecouldemploythe 2. (edges)each(transitive)edgefromnodemtonodeninq intersectionalgorithmpresentedin[14]toconstructafiniterepre- ismappedtoanedge(path)fromρ(m)toρ(n)inS(cid:48). sentation for the query results. The problem, however, with this 3. (implementation)Foreachcompoundactivitynodenin directsolutionisthatthealgorithmof[14]isofhighcomplexity- q,ρmapsthenodesand(transitive)edgesinτ(n)tonodes exponentialinthesizeoftheBPs4 -henceimpracticalforquery and edges (paths) in τ(cid:48)(ρ(n)). If n is not transitive then evaluation. Animportantresultofthepresentworkwastodetect τ(cid:48)(ρ(n))mustbeanoriginalBPofS(i.e.notarefinement). that BP-QL queries form a subclass of the recognizable sets for whichPTIMEsolutionispossible,andtodesignsuchanalgorithm. Thequeryresultdefinedbyρistheimageunderρofq,restricted OuralgorithmisbasedonamodularconstructionofaCFGGthat toitsoutputnodesandedges.Ifthesamenode/edgeoccursseveral describesthequeryresults.Itreliesonthefollowingtwoideas. timesintheimage,distinctcopiesareusedforeachoccurrence.(cid:163) 1. The first idea is that each query result is a combination of Theresultassociatedwithanembeddingρis,ingeneral,asys- smallerresultsthatdescribehowonequeryprocess,saypq, tem of graphs. As the number of such qualifying results may be ismappedtoonesystemprocess,saypS. Thecombination large(possiblyinfinite)weprovideaconcise(andfinite)represen- mustsatisfythecondition(implementation). tationforthisset. 2. Thesecondideaisthatmanyembeddingssharethesameun- derlyingnodemapping, anddifferonlyontheassignments 3.3 Compactrepresentationofqueryresults to the, possibly transitive, edges. Thus, many results for a Tounderstandtheconstructionofthisconciserepresentation,let pairpq,pS,mayberepresentedtogetherbyusingthisshared usfirstlookatthetwomainfactorsthatcontributetolargeorinfi- node mapping (and, as we did above for flat BPs, repre- niteanswers. Weconsiderfirstflatgraphs,i.e. BPswithnocom- sentingthedifferentpossiblepathassignmentsfortransitive poundactivities,andthennestedBPs. edgesbetweenthenodesbyacopyofthesub-graphthatcon- nectsthenodes.) Ofcourse,whenanodemappingfrompq FlatBPsWhenaBPcontainscycles,thenumberofpathsthatmay topS isconsidered,onemustensurethatitsatisfiesthecon- matchagiven(transitive)queryedgemaybeinfinite.Observethat ditions(nodes)and(edges). evenwhenaBPisacyclic,thenumberofmatchingpathsmaybe large. Forexampleiftheactivityflowforksintoseveralpathsand Thus, a CFGG representation for the query results is obtained then joins back, forks and joins again, and so on, several times, fromacollectionofnodemappingsbetweendistinctpairsofquery thenumberofpossiblepathsisexponentialinthenumberofforks. andsystemprocesses,thatsatisfytheconditionsabove. Thesolutiontothisproblemiseasy: Wecanrepresentthesetof Wenextsketchthemainlinesofthisconstruction.Giveaquery pathsbetweentwonodesbyacopyofthesub-graphthatconnects q andasystemS,considersomeBPpatternpq inq andaBPpS thenodes. Onemightactuallysaythisiswhattheuserintended: inS.Theimportantobservationisthatanembeddingfrompq toa toseethespecificationofthepathsbetweenthetwonodes,rather refinementofthesystemprocesspS consistsofseveralparts. The thantheindividualpathsthemselves. first part maps some of the query nodes and edges of pq into pS itself. Subsequentpartsmapadditionalnodesandedgesofpq into NestedBPsThingsbecomemorecomplexinthepresenceofcom- implementationsofcompoundactivitynodesofpS,andsoon.The pound activities. A system may contain recursive activity imple- nodesandedgesinthequerypatternpq thatarenotmappedtopS mentations, hence have an infinite set of refinements. Since the itself,butintoimplementationsofitscompoundnodes,musthave resultsofaqueryareconstructedfromembeddingsintoallthere- astructurethatfitssuchimplementations.Inparticular,theyshould finements,theremaybeinfinitenumberofsuchpossibleresultsas formasetJ ofdisjointsub-graphsofpq,eachwithasingleentry well. Thesolutionhereisbasedonviewingsystemsandqueries andexitnodes. ascontextfreegraphgrammars[21],abbreviatedCFGG. (Confus- ingly, these are also called in the literature regular graph gram- 4Furthermore,toourknowledge,noPTIMEalgorithmsforthisin- mars. We will use only context free in this paper.) A CFGG is a tersectionproblemareknown. Thus,theconstructionisrecursive. ForeachsuchsetJ itmodi- forthepositiveportionsofthequerygraphs,whichcannotbeex- fiesthequerypatternpq,byreplacingeachsub-graphGinJ bya tendedtoincludethenegativeportions,areconsidered. distinctnewnodelabeled∗(denotedinthesequel∗ ),obtaining a modified query graph pq/J. Then it constructs reGpresentations Labelpredicatesandregularpathexpressions. Thesim- ple queries considered so far only allow nodes with a particular forallresultsobtainedfromembeddingsfromthemodifiedquery pq/J totheBPpS(groupingtogether,asexplainedabove,embed- labelor∗. Butsometimesonemaybeinterestedinsystemnodes thatconformtocertainconditions.Forinstance,ratherthansearch- dingsthatagreeontheirnodemappings).Assumethe∗ nodewas G ingforthesearchFlighsactivity,wemaywanttoretrieveall mappedtoacompoundactivitynode,sayA.Intherepresentation, theactivitieswhosenamecontainsthestring“search”.Thiscanbe this node is noted as A , and it serves as a non-terminal of the G achievedbyusinglabelpredicates.Inanembedding,aquerynode grammar. Then,foreachGandA ,itrecursivelyconstructsrep- G labeledbyalabelpredicatemustbemappedtosystemnodewhose resentations, sayR , forresultsobtainedfromembeddingsfrom G labelsatisfiesthepredicate. G to the implementation of A. The grammar rule then allows to Anotherusefulfeatureareregularpathexpressions. Transitive replaceA byR . G G edges in the query may be annotated by regular expressions. In Special care must be paid in the construction above to transi- anembedding,suchedgesmustbemappedtopathssuchthattheir tive edges. These may be mapped to arbitrarily long paths, that maystartinonesystemBPpS,continueinanimplementationof labelsequenceformsawordinthecorrespondingregularlanguage. acompoundactivitynodeofpS,andsoon,possiblythroughacy- Theconstructionofafiniterepresentationforthequeryanswer extendsnaturallytosupportthesetwoextensions. cleofimplementations.Suchpathsarebrokenintocomponentsby introducing special dummy nodes into the query pattern pq, then Variables and joins. Together with label predicates and reg- employingtheconstructionasdescribedabove. ular path expressions, one may also want to use label and path Forlackofspace, weomitherethepresentationofthefullal- variablesandtestfor(in)equalityoftheassignedlabelsandpaths. gorithmanditscorrectnessproof. (Theseareavailableinthefull The interpretation is that query nodes labeled by (un)equal label versionofthepaper[6]).Notethatalmostallgraphsforwhichrep- variables are mapped to system nodes with (distinct)identical la- resentations are constructed are sub-graphs of the query patterns bels;queryedgeslabeledby(un)equalpathvariablesaremappedto pq. (Theintroductionofdummynodescomplicatesthisargument pathswhosesequencesoflabelsare(different)equalwords. While abit.) Theconstructionterminateswheneachmappingfromeach theuseoflabelvariablesposesnoparticularproblem,forqueries suchgraphtoeachsystemprocesshasbeenconsidered(orbefore withjoinsonpathvariables,ourconstructionmayfail;theanswer that,ifembeddingsforsomesub-graphareneverrequired). tosuchqueriesmaynolongerberepresentableasafinitesystem. Theimportantpointtoobserveisthefollowing. To understand why, recall that our systems may be viewed as THEOREM 3.6. Thesizeoftherepresentationforthequeryre- CFGGs. Aquerythattestsforequalityofpathvariablesmayhave sults,constructedbytheabovealgorithm,aswellastheconstruc- forananswersetsofgraphsthatarenotaCFGGlanguageandare tiontime,arepolynomialinthesizeofthesystemS(withtheexpo- inherentlyhardertocompute,asillustratedbythefollowingtheo- nentdeterminedbythesizeofq). rem. Thetheoremalsohighlightsthedifferenceincomputational complexitybetweenthequeryingofflatandnestedgraphs. Theintuitionisthat,intheworstcase,eachsub-graphofeachquery pattern needs to be mapped into each system BP. The number of THEOREM 3.7. For queries with equality conditions on path querysub-graphsoftheappropriateformisafunctionofthequery variables,theproblemoftestingwhetherthequeryanswerisempty size alone. Since embeddings that agree on their node mappings onasystemisundecidable. arerepresentedtogether,itsufficestocountthedistinctnodemap- The problem can be solved in exponential time if the system pings. Foreachquerysub-graphandeachsystemBP,thisnumber to which the query is applied has no recursive activities. It is ispolynomialinthesizeoftheBP(withtheexponentdetermined PSPACE-hardw.r.tthesystemsize,evenifthesystemBPsalsohave bythesizeofthequerysub-graph.) nocycles. Finally,forflatBPs,itcanbesolvedintimepolynomialinthe 3.4 ARichermodel sizeofthesystem(withtheexponentdeterminedbythesizeofq). So far, for the sake of simplicity, we used a very simple data Proof:(sketch)Theundecidabilityandhardnessproofsarebyre- modelandquerylanguage.Wenowpresentsomeusefulextensions duction to the problem of testing whether the intersection of the that enhance the expressive power, and facilitate the querying of reallifebusinessprocesses. languages of two string context free grammars (CFGs) is empty. Giventwo CFGsG1,G2, webuildasystemS withaBPthathas Negation. Inaquerywithnegation,thepatternshavesomenodes twobranches. Thefirstcontainsacompoundactivitynodeg and 1 andedgesthataredistinguishedasnegative.Theintuitiveinterpre- the second a compound activity g . The implementation of g , 2 i tationisthatthequerysearchesforoccurrencesofthepositivepor- i = 1,2, (whichresemblesinspiritthegrammarrulesofG ), is i tionsofthepatterns,forwhichnoneofthenegativepartsco-occur. definedsuchthateachofitspossiblerefinementshasline-shaped Moreformally,todefinethesemanticsofquerieswithnegation structure,representingawordinthecontextfreelanguageofG . i weextendthenotionofembedding:Givenaqueryqwithnegation, Theimplementationsaredefinedsuchthatg andg canberefined 1 2 thepositivepartofq, denotedpositive(q), isthequeryobtained toanactivitysequencewiththesameshapeiffthissequencerep- from q by deleting all the negative edges and nodes, and all the resentsawordthatbelongstoboth G andG . Next, wedefine 1 2 edges incident on these nodes. The embeddings of q into S are queryqwithtwotransitiveedgese ,e thatmatch(therefinements 1 2 thoseembeddingsofpositive(q)whichcannotbeextendedtoan of)g andg resp.,andhaveanequalityconditionontheirattached 1 2 embeddingofanyqueryq(cid:48) obtainedfrompositive(q)byadding pathvariables. Itiseasytoseethatthequeryhasanonemptyre- allthenegativenodesandedgesofsomeofitsBPpatterns. The sultiffthelanguagesintersectionisnotempty. Thisisknownto answerofqisdefinedasbefore,basedontheaboveembeddings. beundecidableinthegeneralcase,andwasrecentlyprovedtobe Afiniterepresentationforthequeryanswercanbeconstructed PSPACE-completefornon-recursivecontextfreelanguages[29]. essentiallyasabove. Theonlydifferenceisthatonlyembeddings Thepolynomialandexponentialalgorithmsworkasfollows.For flat BPs, the algorithm considers all possible mappings of query nodestotheBP.Testingjoinconditionshereamountstotestingif theintersectionoftheregularlanguagesdefinedbythesub-graph that connects the nodes is empty, which can be done in PTIME. For nested, non-recursive, BPs, the algorithm enumerates all the systemrefinements(possiblyanexponentialnumber)andtestsfor theexistenceofalegalembeddinginasimilarway. (cid:163) We have consequently decided to restrict the use of path vari- ablesinBP-QLandallowjoinsonlyonlabelvariables. Distributed systems and queries. Sofar, wehaveignored Figure11:BPELXML. distribution. Inadistributedsetting,eachpeerholdsasetBPsand 4.1 DesignConsiderations may provides (resp. use) activities to (of) remote peers. If the serviceprovidersmaketheirspecificationavailabletotheircoop- BP-QL is based on an intuitive, conceptual model of BPs, an erating organizations (say via a web service), users may wish to abstractionoftheBPELspecification,allowingforsimpleformu- zoom-inontheseremotecomponentsaswelltoquerytheservice lation of queries. over this model. When we considered the im- specification. plementation,thefollowingproblemhadtobeaddressed:Asmen- The data model extends naturally to this setting, associating a tionedinSection1,theBPELXMLformatwasdesignedwithease peeridwitheachprocessandeachactivitynode.Queriesmaythen ofautomaticcodegeneration,ratherthanquerying,inmind.Activ- annotategraphpatternsandactivitynodesbypeerids, restricting itiesandedgesaredefinedseparately,asdistinctactivityandlink thesearchtothespecifiedpeers. Inparticular,whena(transitive) elements. The process flow is only recorded by associating with activity node in a query is annotated by a peer id, the search is eachactivityelementtheidsofitsincomingandoutgoingedges, restrictedtoimplementationssuppliedbythespecifiedpeer(resp. representedresp.bytargetandsourcechildrenofthenode.Thisis refinementsconsistingonlyofinvocationsofactivitiesofthespec- illustratedinFigure11,whichshowstheBPELXMLrepresenta- ifiedpeer).Moregenerally,queriesmayusepredicatesonpeerids tion6oftheTravelAgencybusinessprocessfromFigure1.Conse- torestrictthesearchtoaspecificfamilyofpeers. quently,tocheckwhetherflowpathsofagivenprocesssatisfythe Remark: While the extension of the formal model to a distrib- conditionsdetailedinaBP-QLquery,alarge,possiblyunbounded, utedsettingisratherimmediate,implementation-wise,distribution numberofjoinoperationsinvolvingedgeidsbetweenactivityand posessignificantchallengesintermsofqueryevaluation. Specifi- edgeelementsneedstobeperformed. Whilethisisexpressiblein, cally,wewouldliketoevaluateaqueryina“lazy”manner,sothat say,XQuery,e.g.withtheuseofrecursivefunctions,theexcessive onlythosepeerswhoseprocessesandactivitiesareindeedrelevant numberofjoinsbecomesaperformancebottleneck. tothequeryareconsulted. Furthermore, itisdesirableto“push” Todrasticallyreducethenumberofjoins,wedecidedtostorea partsofthequery,whenpossible,tothepeersholdingtherelevant processspecificationinastructuremoresimilartoitsgraphview. process information. Our implementation, described in the next In XML terms, the parent child relationships in the XML repre- section,addressestheseissues. sentationofaprocessshouldreflectthe“followedby”relationship ofnodesintheprocessgraph.ThiswouldallowtheuseofXPath’s Summary. ThedesignofBP-QLwasdirectedbythespecialre- ”/”and”//”operatorsforqueryingflowpaths,avoidingmanyjoins. quirementofqueryingspecificationswithazoom-infeatureatdif- But,sinceatypicalBPisagraph,ratherthanatree,wealsouse ferentlevelsofgranularityandtheretrievalofqualifyingexecution XMLidrefstocapturethegraphstructure. paths. As explained above, this required a careful design of the Anotherfundamentaldecisiontobemadewaswhichofthefol- languagetoavoidfeaturesthatmightseemtobeworthyofinclu- lowingtwooptionstochoose:(1)toimplementawholenewquery sion in the language, such as joins on path variables, but incur a engineforourmodelfromscratch,or(2)torelyonsomeexisting prohibitivelyhighcomputationalcost. queryenginetoperformasmuchaspossiblefromthecomputation, ThecharacterizationoftheexactexpressivepowerofBP-QLis andcompletetheprocessingofthemissingfeaturesbyanadequate anon-goingresearch.OurinitialresultsindicatethatBP-QLcanbe pre and post processing of queries and query results. We opted characterizedasaparticularsubclassofFO(TC)5.Inparticular,for forthesecondoption. Theissuestobeconsideredinselectingan flatBPs BP-QLcaptures powersimilar tothat ofthe conjunctive enginewerethefollowing: partofXPathandcoreXQuery,includingnegation,whenconsid- • Ourquerylanguageallowstoretrievepaths,whereastypical eredinthecontextofgraphs. Duetospacelimitationsthisisnot existingXML/graphquerylanguagesonlyretrievenodes. presentedhere. • Ourquerylanguageoffersazoom-infacility. • Businessprocessestypicallyoperateinacross-organization, 4. IMPLEMENTATION distributed environment. The specifications of the services Thequerylanguagepresentedabovehasbeenfullyimplemented participating in process may reside on distinct peers. Dis- andtestedintheBP-QLpeer-to-peersystem.Thesystemprovides tributedqueryprocessingthusbecomesessential. persistentstorageforBPELspecifications, allowsuserstodesign A natural candidate was to use a standard XQuery engine, en- newprocesses,andtoqueryexistingspecifications. joying the benefits of indexing and optimization offered by such ThevisualinterfaceofthesystemisimplementedasanEclipse engines.7 However,XQuerydoesnotsupporttheretrievalofpaths, [20]plug-in. Itallowsto:designnewbusinessprocessesandstore distribution,orzoom-inqueries;nordoesit“traverse”idrefs.Nec- theirspecificationsintherepository; importexistingBPELdocu- essarily,allofthesewouldhavetobeimplementedbypreandpost mentstotherepository;formulatequeries,runthemandviewthe 6For simplicity, the figure provides an abstraction of the actual results. Therestofthesectionisdevotedtothemaincomponent BPELXMLfilestructure,withmanydetailsomitted. —thequeryengine. 7AnalternativeviablesolutiontothegraphshapeofBPscouldbe 5FirstOrderLogicaugmentedwithTransitiveClosure. touseanativegraphqueryengine. processing. Consequently, wedecidedtobaseoursolutiononan extensionofXML,calledActiveXML(AXMLforshort).AXML is essentially a middleware system that includes an XQuery-like querylanguage,butoffersadditionalfacilitieswhichprovidebetter supportforaddressingsomeoftheaboveissues. Additionalben- efitsincludecertainoptimizationtechniquesthatareimplemented intheAXMLsystem,asexplainedbelow. AbriefoverviewofActiveXML. ActiveXML(AXML,for short)isadeclarativeframeworkthatharnessesWebservicesfor data integration, and works in a peer-to-peer architecture[3]. An AXMLdocumentisanXMLdocumentwheresomedataisgiven extensionally,asregularXMLelements,whileotherdataisgiven intensionally,bymeansofcallstoWebservices[3],andcanbema- terializedbyinvokingtheservices.AXMLemploysthequerylan- guageXOQL,anXQuery-likequerylanguageasitsqueryengine. Figure12:AXMLtree. WhenaqueryisevaluatedonanAXMLdocument,theservicecalls whose answer may be relevant for the query are identified; only Thusqueryevaluationcanaccessthereturnedsubtreeasifit these calls are invoked. Additionally, (sub-)queries are pushed, actuallytraversedthe“pointer”. whenpossible,totheserviceproviders,thusreducingthecostsof • A getOperation service call retrieves the specification datamaterializationandtransfer. Recursivecallsaretracked,and ofaremotecompoundactivityandconvertsit,whenneeded, onlytherelevantdataismaterialized(see[2]fordetails). fromBPELformattoanAXMLrepresentation.Azoom-out elementisattachedtoitsfinalstate, sothatitpointstothe Insummary,BP-QLusestheAXMLsystem[3]asanimplemen- followingactivityintheflow. tationplatform.ThefacilitiesofferedbyAXMLareusedtoaddress our needs, as follows: Intentional data, implemented by service Toillustratethefirsttypeofcall,thegetActivity("join") calls,areusedinourimplementationto(1)retrieve,whenneeded, nodes below the searchFlights and searchRooms, in the thespecificationsofremoteprocesses,thussupportingdistributed middleofFigure12,pointtothejoinnodebelowsearchCars. processing, and (2) account for the graph structure of the speci- Theyrepresentthefactthatthethreesearchesarefollowedbythat fication (service calls play here role similar to XML idrefs, with samejoinoperation.8 theadvantagethattheyaretraversedautomaticallyinqueryevalu- ThegetOperation("searchRooms,"join")inthefig- ation). BPELdocumentsarewrappedandrepresentedasAXML ureillustratesthesecondtypeofcall. Itretrievesthespecification documents;BP-QLqueriesarepre-processedandcompiledintoa ofthesearchRoomsprocess,andsetitszoom-outtothefollowing setofXQuery-likequeriesoversuchdocuments. Postprocessing “join” operation. Here again, AXML invokes getOperation isemployedtocompletethecomputation,e.g. tovalidatezoom-in calls for the remote activities whose specification is judged to be relationships,toextractpathsandtoconstructacompactrepresen- relevant for query evaluation. As mentioned above, it may also tationfortheresult. “push”(sub-)queriestocapableserviceproviders,suchasBP-QL peersthat“understand”BP-QLqueries. From BP-QL to AXML. Here is a brief description of the DataelementsanddataflowarerepresentedinAXMLtreeina AXML representation of a BP-QL business process. The repre- similar manner: The tree contains both data and activity element sentation consists of three parts: Process properties (such as the nodes. getData and getActivity service calls are used as serviceprovider,theservicetypeandcapabilities)aremaintained “references”betweentreedataandactivitynodes,resp. as UDDI entries in a (standard) XML document. The other two, TogeneratetheAXMLrepresentation,theBP-QLgraphistra- namelytheprocessactivitiesandexecutionflow,andthedataele- versedinadepth-firstorder,buildingAXMLtreesasdeepaspossi- mentsandthedataflow,aremaintainedintwoAXMLdocuments. ble.Localcompoundactivitiesarethenzoomed-inandtheirgraphs TheuseoftwoAXMLtrees, ratherthanone, allowsforefficient aresimilarlydetailed, recursively. Requeststoremoteoperations evaluationofBP-QLquerieswithdoubleheadededges: itallows are represented by getOperation service calls. Web services adoublyheadedactivity(resp. data)flowedgetobemappedtoa aregeneratedforprovidedoperations,exposingtheirspecification “//”operatoronthecorrespondingAXMLdocument. totherequestingpeers. Forexample,Figure12describes(partof)theAXMLtreeforthe With this representation, both the path-based and the zoom-in Alpha-Toursactivitiesandflow.(Hereagain,forsimplicity,onlyan axisconditionscanbeevaluatedusingXOQLqueriesontheAXML abstractionoftheactualAXMLtreeisprovided,withmanydetails documents.Somepostprocessingisneverthelessrequiredtomatch omitted.) EachactivityisrepresentedbyanXMLelementnodein upthecomponents,extracttherequestedpaths(XOQL,likemost thetree. Theparentchildrelationshipsreflecttheflow. Eachnode XQueryengines, returnsonlydocumentelementsnotpaths), and representingacompoundactivityistheroot(labeledbyzoom-in)of constructacompactrepresentationoftheresult. Weomitthede- asubtreethatdescribestheinternalstructureoftheactivity.Nodes tailsforspaceconstraints. with bold labels are special elements that represent calls to Web 4.2 Trade-offs services.Twotypesofsuchcallsareembeddedinthedocument: Asexplainedabove,wehavedecidedtostoreBPsinastructure • A getActivity service call plays a role similar to that of an closetotheBPgraphshape,ratherthanintheBPELformat.Obvi- XML idref, “pointing” to acertain node inthe tree. When ously,thisreducesthenumberofjoinoperationsrequiredinquery a query is evaluated, the relevant calls are detected and in- voked. (Cycles are detected and cut by AXML). For each 8In the implementation, the input to the getActivity is a uniqueidentifierforthepointedactivity,consistingofBPandac- call, thereturneddata(acopyofthesub-tree“pointedto”) tivityids.Itisabstractedhere,forbrevity,bytheactivityname. isinsertedinplaceoftheservicecall,readytobeaccessed.

Description:
Catriel Beeri. The Hebrew University [email protected]. Anat Eyal. Tel Aviv University [email protected]. Simon Kamenkovich. Tel Aviv University.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.