Exchanging Intensional XML Data TOVAMILO INRIAandTel-AvivUniversity SERGEABITEBOUL INRIA BERNDAMANN Cedric-CNAMandINRIA-Futurs and OMARBENJELLOUNandFREDDANGNGOC INRIA XMLisbecomingtheuniversalformatfordataexchangebetweenapplications.Recently,theemer- genceofWebservicesasstandardmeansofpublishingandaccessingdataontheWebintroduced anewclassofXMLdocuments,whichwecallintensionaldocuments.TheseareXMLdocuments wheresomeofthedataisgivenexplicitlywhileotherpartsaredefinedonlyintensionallybymeans ofembeddedcallstoWebservices. Whensuchdocumentsareexchangedbetweenapplications,onehasthechoiceofwhetheror nottomaterializetheintensionaldata(i.e.,toinvoketheembeddedcalls)beforethedocument issent.Thischoicemaybeinfluencedbyvariousparameters,suchasperformanceandsecurity considerations.Thisarticleaddressestheproblemofguidingthismaterializationprocess. Wearguethat—likeforregularXMLdata—schemas(a`laDTDandXMLSchema)canbeused tocontroltheexchangeofintensionaldataand,inparticular,todeterminewhichdatashouldbe materialized before sending a document, and which should not. We formalize the problem and provide algorithms to solve it. We also present an implementation that complies with real-life standardsforXMLdata,schemas,andWebservices,andisusedintheActiveXMLsystem.We illustrate the usefulness of this approach through a real-life application for peer-to-peer news exchange. CategoriesandSubjectDescriptors:H.2.5[DatabaseManagement]:HeterogeneousDatabases GeneralTerms:Algorithms,Languages,Verification AdditionalKeyWordsandPhrases:Dataexchange,intensionalinformation,typing,Webservices, XML ThisworkwaspartiallysupportedbyEUISTprojectDBGlobe(IST2001-32645). ThisworkwasdonewhileT.Milo,O.Benjelloun,andF.D.NgocwereatINRIA-Futurs. Authors’currentaddresses:T.Milo,SchoolofComputerScience,TelAvivUniversity,RamatAviv, TelAviv69978,Israel;email:[email protected];S.AbiteboulandB.Amann,INRIA-Futurs,Parc ClubOrsay-University,4RueJeanMonod,91893OrsayCedex,France;email:{serge,abiteboul, bernd.amann}@inria.fr;O.Benjelloun,GatesHall4A,Room433,StanfordUniversity,Stanford, CA 94305-9040; email: [email protected]; F. D. Ngoc, France Telecom R&D and LRI, 38–40, rue du Ge´ne´ral Leclerc, 92794 Issy-Les Moulineaux, France; email: Frederic.dangngoc@ rd.francetelecom.com. Permissiontomakedigital/hardcopyofpartorallofthisworkforpersonalorclassroomuseis grantedwithoutfeeprovidedthatthecopiesarenotmadeordistributedforprofitorcommercial advantage,thecopyrightnotice,thetitleofthepublication,anditsdateappear,andnoticeisgiven thatcopyingisbypermissionofACM,Inc.Tocopyotherwise,torepublish,topostonservers,orto redistributetolistsrequirespriorspecificpermissionand/orafee. (cid:1)C 2005ACM0362-5915/05/0300-0001$5.00 ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005,Pages1–40. 2 • T.Miloetal. 1. INTRODUCTION XML, a self-describing semistructured data model, is becoming the standard formatfordataexchangebetweenapplications.Recently,theuseofXMLdoc- umentswheresomepartsofthedataaregivenexplicitly,whileothersconsist of programs that generate data, started gaining popularity. We refer to such documents as intensional documents, since some of their data are defined by programs. We term materialization the process of evaluating some of the pro- gramsincludedinanintensionalXMLdocumentandreplacingthembytheir results.Thegoalofthisarticleistostudythenewissuesraisedbytheexchange of such intensional XML documents between applications, and, in particular, how to decide which parts of the data should be materialized before the docu- mentissentandwhichshouldnot. This work was developed in the context of the Active XML system [Abiteboul et al. 2002, 2003b] (also see the Active XML homepage of Web sitehttp://www-rocq.inria.fr/verso/Gemo/Projects/axml).Thelatteriscen- teredaroundthenotionofActiveXMLdocuments,whichareXMLdocuments wherepartsofthecontentisexplicitXMLdatawhereasotherpartsaregener- atedbycallstoWebservices.Inthepresentarticle,weareonlyconcernedwith certain aspects of Active XML that are also relevant to many other systems. Therefore, we use the more general term of intensional documents to denote documentswithsuchfeatures. Tounderstandtheproblem,letusfirsthighlightanessentialdifferencebe- tween the exchange of regular XML data and that of intensional XML data. In frameworks such as those of Sun1 or PHP,2 intensional data is provided byprogrammingconstructsembeddedinsidedocuments.Uponrequest,allthe code is evaluated and replaced by its result to obtain a regular, fully mate- rialized HTML or XML document, which is then sent. In other terms, only extensional data is exchanged. This simple scenario has recently changed due to the emergence of standards for Web services such as SOAP, WSDL,3 and UDDI.4 Webservicesarebecomingthestandardmeanstoaccess,describeand advertise valuable, dynamic, up-to-date sources of information over the Web. RecentframeworkssuchasActiveXML,butalsoMacromediaMX5andApache Jelly6startedallowingforthedefinitionofintensionaldata,byembeddingcalls toWebservicesinsidedocuments. Thisnewgenerationofintensionaldocumentshaveapropertythatweview hereascrucial:sinceWebservicescanessentiallybecalledfromeverywhereon theWeb,onedoesnotneedtomaterializealltheintensionaldatabeforesending adocument.Instead,amoreflexibledataexchangeparadigmispossible,where thesendersendsanintensionaldocument,andgivesthereceiverthefreedom 1SeeSun’sJavaserverpages(JSP)onlineathttp://java.sun.com/products/jsp. 2SeethePHPhypertextpreprocessorathttp://www.php.net. 3SeetheW3CWebservicesactivityathttp://www.w3.org/2002/ws. 4UDDIstandsforUniversalDescription,Discovery,andIntegrationofBusinessfortheWeb.Go onlinetohttp://www.uddi.org. 5MacromediaColdfusionMX.Goonlinetohttp://www.macromedia.com/. 6Jelly:Executablexml.Goonlinetohttp://jakarta.apache.org/commons/sandbox/jelly. ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005. ExchangingIntensionalXMLData • 3 to materialize the data if and when needed. In general, one can use a hybrid approach,wheresomedataismaterializedbythesenderbeforethedocument issent,andsomebythereceiver. As a simple example, consider an intensional document for the Web page of a local newspaper. It may contain some extensional XML data, such as its name,address,andsomegeneralinformationaboutthenewspaper,andsome intensional fragments, for example, one for the current temperature in the city, obtained from a weather forecast Web service, and a list of current art exhibits,obtained,say,fromtheTimeOutlocalguide.Inthetraditionalsetting, uponrequest,allcallswouldbeactivated,andtheresultingfullymaterialized documentwouldbesenttotheclient.Weallowformoreflexiblescenarios,where the newspaper reader could also receive a (smaller) intensional document, or one where some of the data is materialized (e.g., the art exhibits) and some is leftintensional(e.g.,thetemperature).Abenefitthatcanbeseenimmediately is that the user is now able to get the weather forecast whenever she pleases, just by activating the corresponding service call, without having to reload the wholenewspaperdocument. Beforegettingtothedescriptionofthetechnicalsolutionwepropose,letus firstseesomeoftheconsiderationsthatmayguidethechoiceofwhetherornot tomaterializesomeintensionaldata: —Performance.Thedecisionofwhethertoexecutecallsbeforeorafterthedata transfermaybeinfluencedbythecurrentsystemloadorthecostofcommu- nication.Forinstance,ifthesender’ssystemisoverloadedorcommunication is expensive, the sender may prefer to send smaller files and delegate as much materialization of the data as possible to the receiver. Otherwise, it may decide to materialize as much data as possible before transmission, in ordertoreducetheprocessingonthereceiver’sside. —Capabilities.AlthoughWebservicesmayinprinciplebecalledremotelyfrom everywhere on the Internet, it may be the case that the particular receiver of the intensional document cannot perform them, for example, a newspa- per reader’s browser may not be able to handle the intensional parts of a document. And even if it does, the user may not have access to a particular service,forexample,becauseofthelackofaccessrights.Insuchcases,itis compulsorytomaterializethecorrespondinginformationbeforesendingthe document. —Security. Even if the receiver is capable of invoking service calls, she may prefer not to do so for security reasons. Indeed, service calls may have side effects.Receivingintensionaldatafromanuntrustedpartyandinvokingthe callsembeddedinitmaythusleadtoseveresecurityviolations.Toovercome thisproblem,thereceivermaydecidetorefusedocumentswithcallstoser- vices that do not belong to some specific list. It is then the responsibility of a helpful sender to materialize all the data generated by such service calls beforesendingthedocument. —Functionalities.Lastbutnotleast,thechoicemaybeguidedbytheapplica- tion.Insomecases,forexample,foraUDDI-likeserviceregistry,theoriginof theinformationiswhatistrulyrequestedbythereceiver,andhenceservice ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005. 4 • T.Miloetal. Fig.1. Dataexchangescenarioforintensionaldocuments. calls should not be materialized. In other cases, one may prefer to hide the trueoriginoftheinformation,forexample,forconfidentialityreasons,orbe- causeitisanassetofthesender,sothedatamustbematerialized.Finally, callingservicesmightalsoinvolvesomefeesthatshouldbepayedbyoneor theotherparty. Observe that the data returned by a service may itself contain some inten- sional parts. As a simple example, TimeOut may return a list of 10 exhibits, along with a service call to get more. Therefore, the decision of materializing some information or not is inherently a recursive process. For instance, for clientswhocannothandleintensionaldocuments,thenewspaperserverneeds torecursivelymaterializeallthedocumentbeforesendingit. Howcanoneguidethematerializationofdata?Forpurelyextensionaldata, schemas (like DTD and XML Schema) are used to specify the desired format of the exchanged data. Similarly, we use schemas to control the exchange of intensionaldataand,inparticular,theinvocationofservicecalls.Thenovelty hereisthatschemasalsoentailinformationaboutwhichpartsofthedataare allowedtobeintensionalandwhichservicecallsmayappearinthedocuments, and where. Before sending information, the sender must check if the data, in its current structure, matches the schema expected by the receiver. If not, thesendermustperformtherequiredcallsfortransformingthedataintothe desiredstructure,ifthisispossible. A typical such scenario is depicted in Figure 1. The sender and the re- ceiver,basedontheirpersonalpolicies,haveagreedonaspecificdataexchange schema. Now, consider some particular data t to be sent (represented by the grey triangle in the figure). In fact, this document represents a set of equiv- alent, increasingly materialized, pieces of information—the documents that maybeobtainedfromt bymaterializingsomeoftheservicecalls(q, g,and f). ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005. ExchangingIntensionalXMLData • 5 Among them, the sender must find at least one document conforming to the exchangeschema(e.g.,thedashedone)andsendit. This schema-based approach is particularly relevant in the context of Web services,sincetheirinputparametersandtheirresultsmustmatchparticular XMLSchemas,whicharespecifiedintheirWSDLdescriptions.Thetechniques presentedinthisarticlecanbeusedtoachievethat. Thecontributionsofthearticleareasfollows: (1) WeprovideasimplebutflexibleXML-basedsyntaxtoembedservicecalls inXMLdocuments,andintroduceanextensionofXMLSchemafordescrib- ing the required structure of the exchanged data. This consists in adding new type constructors for service call nodes. In particular, our typing dis- tinguishes between accepting a concrete type, for example, a temperature element, and accepting a service call returning some data of this type, for example,()→temperature. (2) Givenadocumenttandadataexchangeschema,thesenderneedstodecide which data has to be materialized. We present algorithms that, based on schemaanddataanalysis,findaneffectivesequenceofcallinvocations,if such a sequence exists (or detect a failure if it does not). The algorithms provide different levels of guarantee of success for this rewriting process, rangingfrom“sure”successtoa“possible”one. (3) Atahigherlevel,inordertocheckcompatibilitybetweenapplications,the sender may wish to verify that all the documents generated by its appli- cation may be sent to the target receiver, which involves comparing two schemas.Weshowthatthisproblemcanbeeasilyreducedtotheprevious one. (4) We illustrate the flexibility of the proposed paradigm through a real-life application:peer-to-peernewssyndication.WewillshowthatWebservices canbecustomizedbyusingandenforcingseveralexchangeschemas. Asexplainedabove,ouralgorithmsfindaneffectivesequenceofcallinvoca- tions,ifoneexists,anddetectfailureotherwise.Inamoregeneralcontext,aner- rormayarisebecauseoftypediscrepanciesbetweenthecallerandthereceiver. One may then want to modify the data and convert it to the right structure, usingdatatranslationtechniquessuchasthoseprovidedbyCluetetal.[1998] andDoanetal.[2001].Asasimpleexample,onemayneedtoconvertatemper- aturefromCelsiusdegreestoFahrenheit.Inourcontext,thiswouldamountto plugging(possiblyautomatically)intermediaryexternalservicestoperformthe needed data conversions. Existing data conversion algorithms can be adapted todeterminewhenconversionisneeded.Ourtypingalgorithmscanbeusedto checkthattheconversionsleadtomatchingtypes.Dataconversiontechniques are complementary and could be added to our framework. But the focus here isonpartiallymaterializingthegivendatatomatchthespecifiedschema. Thecoretechniqueofthisworkisbasedonautomatatheory.Forpresentation reasons, we first detail a simplified version of the main algorithm. We then describe a more dynamic, optimized one, that is based on the same core idea andisusedinourimplementation. ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005. 6 • T.Miloetal. Althoughtheproblemsstudiedinthisarticlearerelatedtostandardtyping problems in programming languages [Mitchell 1990], they differ here due to theregularexpressionspresentinXMLschemas.Indeed,thegeneralproblem thatwillbeformalizedherewasrecentlyshowntobeundecidablebyMuscholl et al. [2004]. We will introduce a restriction that is practically founded, and leadstoatractablesolution. Alltheideaspresentedherehavebeenimplementedandtestedinthecontext oftheActiveXMLsystem[Abitebouletal.2002](seealsotheActiveXMLhome- pageofWebsitehttp://www-rocq.inria.fr/verso/Gemo/Projects/axml).This system provides persistent storage for intensional documents with embedded callstoWebservices,alongwithactivefeaturestoautomaticallytriggerthese servicesandthusenrich/updatetheintensionaldocuments.Furthermore,ital- lowsdeveloperstodeclarativelyspecifyWebservicesthatsupportintensional documentsasinputandoutputparameters.Weusedthealgorithmsdescribed heretoimplementamodulethatcontrolsthetypesofdocumentsbeingsentto (andreturnedby)theseWebservices.Thismoduleisinchargeofmaterializing theappropriatedatafragmentstomeettheinterfacerequirements. Inthefollowing,weassumethatthereaderisfamiliarwithXMLanditstyp- ing languages (DTD or XML Schema). Although some basic knowledge about SOAPandWSDLmightbehelpfultounderstandthedetailsoftheimplemen- tation,itisnotnecessary. Thearticleisorganizedasfollows:Section2describesasimpledatamodel and schema specification language and formalizes the general problem. Ad- ditional features for a richer data model that facilitate the design of real life applicationsarealsointroducedinformally.Section3focusesondifficultiesthat ariseinthiscontext,andpresentsthekeyrestrictionthatweconsider.Italso introducesthenotionsof“safe”and“possible”rewritings,whicharestudiedin Section4and 5,respectively.Theproblemofcheckingcompatibilitybetweenin- tensionalschemasisconsideredinSection6.Theimplementationisdescribed in Section 7. Then, we present in Section 8 an application of the algorithms toWebservicescustomization,inthecontextofpeer-to-peernewssyndication. Thelastsectionstudiesrelatedworksandconcludesthearticle. 2. THEMODELANDTHEPROBLEM Tosimplifythepresentation,westartbyformalizingtheproblemusingasimple datamodelandaDTD-likeschemaspecification.Moreprecisely,wedefinethe notionofrewriting,whichcorrespondstotheprocessofinvokingsomeservice callsinanintensionaldocument,inordertomakeitconformtoagivenschema. Oncethisisclear,weexplainhowthingscanbeextendedtoprovidethefeatures ignoredbythefirstsimplemodel,andinparticularweshowhowricherschemas aretakenintoaccount. 2.1 TheSimpleModel We first define documents, then move to schemas, before formalizing the key notionofrewritings,andstatingtheresultsobtainedinthissetting,whichwill bedetailedinthefollowingsections. ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005. ExchangingIntensionalXMLData • 7 Fig.2. Anintensionaldocumentbefore/afteracall. 2.1.1 Simple Intensional XML Documents. We model intensional XML documentsasorderedlabeledtreesconsistingoftwotypesofnodes:datanodes andfunctionnodes.Thelattercorrespondtoservicecalls.Weassumetheexis- tence of some disjoint domains: N of nodes, L of labels, F of function names,7 and D of data values. In the sequel we use v,u,w to denote nodes, a,b,c to denotelabels,and f, g,q todenotefunctionnames. Definition 2.1. An intensional document d is an expression (T,λ), where T =(N,E,<)isanorderedtree. N ⊂N isafinitesetofnodes, E ⊂ N×N are the edges, < associates with each node in N a total order on its children, and λ : N → L∪F ∪D is a labeling function for the nodes, where only leaf nodes maybeassigneddatavaluesfromD. Nodes with a label in L∪D are called data nodes while those with a label in F are called function nodes. The children subtrees of a function node are thefunctionparameters.Whenthefunctioniscalled,thesesubtreesarepassed to it. The return value then replaces the function node in the document. This is illustrated in Figure 2, where data nodes are represented by circles, func- tion nodes are represented by squares, and data values are quoted. Here, the Get TempWebserviceisinvokedwiththecitynameasaparameter.Itreturnsa tempelement,whichreplacesthefunctionnode.AnexampleoftheactualXML representation of intensional documents is given in Section 7. Observe that the parameter subtrees and the return values may themselves be intensional documents,thatis,containfunctionnodes. 2.1.2 Simple Schemas. We next define simple DTD-like schemas for in- tensionaldocuments.Thespecificationassociates(1)aregularexpressionwith eachelementnamethatdescribesthestructureofthecorrespondingelements, and(2)apairofregularexpressionswitheachfunctionnamethatdescribethe functionsignature,namely,itsinputandoutputtypes. Definition 2.2. Adocumentschemasisanexpression(L,F,τ),where L⊂ L and F ⊂ F are finite sets of labels and function names, respectively; τ is a functionthatmapseachlabelnamel ∈ Ltoaregularexpressionover L∪F or to the keyword “data” (for atomic data), and maps each function name f ∈ F toapairofsuchexpressions,calledtheinputandoutputtypeof f anddenoted byτ (f)andτ (f). in out 7WeassumeinthismodelthatfunctionnamesidentifyWebserviceoperations.Thistranslatesin theimplementationtoseveralparameters(URL,operationname,...)thatallowonetoinvokethe Webservices. ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005. 8 • T.Miloetal. Forinstance,thefollowingisanexampleofaschema: data: τ(newspaper) = title.date.(Get Temp|temp).(TimeOut|exhibit∗) τ(title) = data τ(date) = data τ(temp) = data τ(city) = data τ(exhibit) = title.(Get Date|date) (∗) functions: τ (Get Temp) = city in τ (Get Temp) = temp out τ (TimeOut) = data in τ (TimeOut) = (exhibit|performance)∗ out τ (Get Date) = title in τ (Get Date) = date out We next define the semantics of a schema, that is, the set of its instances. To do so, if R is a regular expression over L ∪ F, we denote by lang(R) the regular language defined by R. The expression lang(data) denotes the set of datavaluesinD. Definition 2.3. An intensional document t is an instance of a schema s = (L,F,τ) if for each data node (respectively function node) n ∈ t with label l ∈ L (respectively l ∈ F), the labels of n’s children form a word in lang(τ(l)) (respectivelyinlang(τ (l))). in For a function name f ∈ F, a sequence t ,...,t of intensional trees is an 1 n inputinstance(respectivelyoutputinstance)of f,ifthelabelsoftherootsform awordinlang(τ (f))(respectivelylang(τ (f)),andallthetreesareinstances8 in out ofs. ItiseasytoseethatthedocumentofFigure2(a)isaninstanceoftheschema of(∗),butnotofaschemawithτ(cid:7) identicaltoτ above,exceptfor (∗∗) τ(cid:7)(newspaper)=title.date.temp.(TimeOut|exhibit∗). However,sinceτ (Get Temp)=temp,thedocumentcanalwaysbeturnedinto out an instance of the schema of (∗∗), by invoking the Get Temp service call and replacingitbyitsreturnvalue.Ontheotherhand,consideraschemawithτ(cid:7)(cid:7) identicaltoτ,exceptfor (∗∗∗) τ(cid:7)(cid:7)(newspaper)=title.date.temp.exhibit∗. According to its signature, a call to TimeOut may also return performance elements. Therefore, in general, the document may not become an instance of the schema of (∗ ∗ ∗). However, it is possible that it becomes one (if 8LikeinDTDs,everysubtreeconformstothesameschemaasthewholedocument. ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005. ExchangingIntensionalXMLData • 9 TimeOut returns a sequence of exhibits). The only way to know is to call the service. Thistypeof“on-line”testingisfineifthecallshavenosideeffectsordonot costmoney.Iftheydo,wemightwanttowarnthesender,beforeinvokingthe call,thattheoverallprocessmaynotsucceed,andseeifshewantstoproceed nevertheless. 2.1.3 Rewritings. Whentheproperinvocationofservicecallsleadsforsure tothedesiredstructure,wesaythattherewritingissafe,andwhenitonlypos- siblydoes,thatthisisapossiblerewriting.Thesenotionsareformalizednext. Definition 2.4. For a tree t, we say that t →v t(cid:7) if t(cid:7) is obtained from t by selectingafunctionnodevintwithsomelabel f andreplacingitbyanarbitrary output instance of f.9 If t →v1 t →v2 t ··· →vn t we say that t rewrites into t , ∗ 1 2 n n denotedt →t .Thenodesv ,...,v arecalledtherewritingsequence.Theset n 1 n ofalltreest(cid:7) s.t.t →∗ t(cid:7) isdenotedext(t). Note that in the rewriting process, the replacement of a function node v by its output instance is independent of any function semantics. In particular, we may replace two occurrences of the same function by two different output instances.Stressingsomewhatthesemantics,thiscanbeinterpretedasifthe value returned by the function changes over time. This captures the behavior of real life Web services, like a temperature or stock exchange service, where twoconsecutivecallsmayreturnadifferentresult. Definition 2.5. Lettbeatreeandsaschema.Wesaythattpossiblyrewrites into s if ext(t) contains some instance of s. We say that t safely rewrites into s either if t is already an instance of s, or if there exists some node v in t such thatalltreest(cid:7) wheret →v t(cid:7) safelyrewriteintos. The fact that t safely rewrites into s means that we can be sure, without actuallymakinganycall,thatwecanchooseasequenceofcallsthatwillturn t intoaninstanceofs.Forinstance,thedocumentofFigure2(a)safelyrewrites intotheschemaof(∗∗)butonlypossiblyrewritesintothatof(∗∗∗). Finally, to check compatibility between applications, we may want to check whether all documents generated by one application (e.g., the sender applica- tion)canbesafelyrewrittenintothestructurerequiredbythesecondapplica- tion(e.g.,theagreeddataexchangeformat). Definition 2.6. Let s be a schema with some distinguished label r called the root label. We say that s safely rewrites into another schema s(cid:7) if all the instancest ofswithrootlabelr rewritesafelyintoinstancesofs(cid:7). Forinstance,considertheschemaof(∗)presentedabovewithnewspaperas therootlabel.Thisschemasafelyrewritesintotheschemaof(∗∗)butdoesnot safelyrewriteintotheoneof(∗∗∗). 9Byreplacingthenodebyanoutputinstancewemeanthatthenodevandthesubtreerootedatit aredeletedfromt,andtheforesttreest1,...,tnofsomeoutputinstanceof f arepluggedatthe placeofv(aschildrenofv’sparent). ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005. 10 • T.Miloetal. 2.1.4 TheResults. Goingbacktothedataexchangescenariodescribedin theintroduction,wecannowspecifyourmaincontributions: (1) We present an algorithm that tests whether a document t can be safely rewritten into some schema s and, if so, provides an effective rewriting sequence,and (2) When safe rewriting is not possible, we present an algorithm that tests whethert maybepossiblyrewrittenintos,andfindsapossiblysuccessful rewritingsequence,ifoneexists. (3) Wealsoprovideanalgorithmfortesting,giventwoschemas,whetherone canbesafelyrewrittenintotheother. 2.2 ARicherDataModel In order to make our presentation clear, and to simplify the definition of doc- ument and schema rewritings, we used a very simple data model and schema language.Wewillnowpresentsomeusefulextensionsthatbringmoreexpres- sivepower,andfacilitatethedesignofreallifeapplications. 2.2.1 Function Patterns. The schemas we have seen so far specify that a particular function, identified by its name, may appear in the document. But sometimes, one does not know in advance which functions will be used at a given place, and yet may want to allow their usage, provided that they con- form to certain conditions. For instance, we may have several editions of the newspaperofFigure2(a),fordifferentcities.Acommonintensionalschemafor such documents should not require the use of a particular Get temp function, but rather allow for a set of functions, which have a suitable signature: they shouldacceptassingleparameteracityelement,andreturnatemperatureel- ement,aspreviouslydefinedinτ.Theparticularweatherforecastservicethat will be used may depend on the city and be, for instance, retrieved from some UDDIserviceregistry.Onemayalsowanttoenforcesomesecuritypolicies,for example, be allowed to specify that the allowed functions should return only extensionalresults. Tospecifysuchsetsoffunctions,weusefunctionpatterns.Afunctionpattern definition consists of a boolean predicate over function names and a function signature. A function belongs to the pattern if its name satisfies the Boolean predicateanditssignatureisthesameastherequiredone.Amoreliberaldefi- nitionwouldbeonethatrequiresthatthefunctionsignatureonlybesubsumed bytheonespecifiedinthedefinition,thatis,thateveryinstanceoftheformer be also an instance of the latter. This is possible but is computationally more heavy, since it entails checking inclusion of the tree language defined by the twoschemas. Intermsofimplementation,onecanassumethatthisnewBooleanpredicate is implemented as a Web service that takes a function name as input and returnstrueorfalse. Totakethisfeatureintoaccountinourmodel,wedefineP tobeadomainof functionpatternnames.Aschemas=(L,F, P,τ)nowalsocontains,inaddition totheelementsandfunctions,asetoffunctionpatternsP ⊂P.τ associatewith ACMTransactionsonDatabaseSystems,Vol.30,No.1,March2005.