Lecture Notes in Computer Science 6378 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA AlfredKobsa UniversityofCalifornia,Irvine,CA,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen TUDortmundUniversity,Germany MadhuSudan MicrosoftResearch,Cambridge,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany Deborah L. McGuinness James R. Michaelis Luc Moreau (Eds.) Provenance and Annotation of Data and Processes ThirdInternationalProvenanceandAnnotationWorkshop IPAW 2010, Troy, NY, USA, June 15-16, 2010 Revised Selected Papers 1 3 VolumeEditors DeborahL.McGuinness TetherlessWorldConstellation RensselaerPolytechnicInstitute 1108thStreet,Troy,NY12180,USA E-mail:[email protected] JamesR.Michaelis TetherlessWorldConstellation RensselaerPolytechnicInstitute 1108thStreet,Troy,NY12180,USA E-mail:[email protected] LucMoreau UniversityofSouthampton SchoolofElectronicsandComputerScience SouthamptonSO171BJ,UnitedKingdom E-mail:[email protected] LibraryofCongressControlNumber:2010940987 CRSubjectClassification(1998):H.3-4,D.4.6,I.2,H.5,K.6,K.4,C.2 LNCSSublibrary:SL3–InformationSystemsandApplication,incl.Internet/Web andHCI ISSN 0302-9743 ISBN-10 3-642-17818-9SpringerBerlinHeidelbergNewYork ISBN-13 978-3-642-17818-4SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. springer.com ©Springer-VerlagBerlinHeidelberg2010 PrintedinGermany Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper 06/3180 In Memoriam, Eleanor Louise McGuinness, 1917 - 2010 Preface Interest in and needs for provenance are growing as data proliferate. Data are increasing in a wide array of application areas, including scientific workflow systems,logicalreasoningsystems,textextraction,socialmedia,andlinkeddata. Asdatavolumesexpandandasapplicationsbecomemorehybridanddistributed in nature,there is growinginterestin where data came fromand how they were produced in order to understand when and how to rely on them. Provenance, or the origin or source of something, can capture a wide range of information. This includes, for example, who or whatgeneratedthe data,the history of data stewardship,mannerofmanufacture,placeandtime ofmanufacture,andsoon. Annotationistightlyconnectedwithprovenancesincedataareoftencommented on,described,andreferredto.Thesedescriptionsorannotationsareoftencritical to the understandability, reusability, and reproducibility of data and thus are often critical components of today’s data and knowledge systems. Provenancehas been recognizedto be importantin a wide rangeof areasin- cluding databases,workflows,knowledge representationand reasoning,anddig- ital libraries. Thus, many disciplines have proposed a wide range of provenance models, techniques, and infrastructure for encoding and using provenance. One timelychallengeforthebroadercommunityistounderstandtherangeofstrengths andweaknessesofdifferentapproachessufficientlytofindandusethebestmodels foranygivensituation.Thisalsocomesatatimewhenanewincubatorgrouphas beenformedattheWorldWideWebConsortium(W3C)toprovideastate-of-the- artunderstandinganddeveloparoadmapintheareaofprovenanceforSemantic Webtechnologies,development,andpossiblestandardization. TheThirdInternationalProvenanceandAnnotationWorkshop(IPAW2010) builtonthesuccessofpreviousworkshopsheldinSaltLakeCity(2008),Chicago (2006, 2002), and Edinburgh (2003). It was held during June 15–16, in Troy, New York, at Rensselaer Polytechnic Institute. IPAW 2010 brought together computer scientists from different areas and provenance users to discuss open problems related to the provenance of computational and non-computational artifacts. A total of 59 people attended the workshop. These attendees came from the United States (USA), the United Kingdom (UK), the Netherlands, Germany, Brazil, and Japan. We received 36 submissions in response to the initial call for papers. Each of these submissions was reviewed by at least three reviewers.Overall,7 submissions wereacceptedas full papers,11 wereaccepted asmedium-length papers,7wereacceptedas demopapers,and6 wereaccepted asshortpapers.Inaddition,afollow-upcallforlate-breakingwork intheformof a poster and abstract was issued, which resulted in 10 additional contributions being made. VIII Preface The workshop was organized as a single-track event with paper, poster, and demo sessions interleaved. Susan Davidson (University of Pennsylvania) pre- sented a keynote address on provenance and privacy. Prior to IPAW 2010, on June 14, 28 attendees participated in a Provenance Hackathon, organized by Paul Groth. The aim of the Provenance Hackathon was to see whether participants,groupedin teams, couldquickly build end-user applicationsthatdemonstrateuniquebenefitsofprovenance,throughleveraging existing infrastructure and provenance models. Each application was evaluated based on provenanceusage and usefulness by a panel of three judges. Details of participatingteams,aswellasprovenancestrategiesandsolutions,canbefound at http://thinklinks.wordpress.com/2010/06/15/provenance-hackathon/ Immediately following IPAW 2010, a group of 28 researchers met to discuss plans for the fourth and last Provenance Challenge. The Provenance Challenge serieswasinitiatedtounderstandandcompareexpressivenessofprovenancesys- tems;itevolvedintoaninteroperabilitychallenge,inwhichprovenanceinforma- tionis exchangedbetweensystems.The SecondProvenancechallengeledto the specification of a common provenance model, the Open Provenance Model [1], whichwastested in the Third ProvenanceChallenge.The purpose ofthe fourth and last Provenance Challenge is to apply the Open Provenance Model to a broadend-to-endscenario,anddemonstratenovelfunctionalitythatcanonlybe achievedbythepresenceofaninteroperablesolutionforprovenance.Thepartic- ipantssuccessfullyidentifiedascenarioandprovenancequeriesaswellasadraft schedule. Details can be found at http://twiki.ipaw.info/bin/view/Challenge/ FourthProvenanceChallenge After the challenge planning meeting, a group of 20 researchers met to dis- cuss evolvingissues with one provenanceInterlingua (PML). The workshopwas organizedbyPauloPinheirodaSilva.Applicationsandtoolswerepresentedand use cases were articulated to help motivate and prioritize language extensions. IPAW2010andassociatedworkshopswereaverysuccessfuleventwithmuch enthusiastic discussion and many new ideas generated. We are grateful for the supportofSTIInnsbruckforsponsoringtheProvenanceHackathon,ofMicrosoft for sponsoring the banquet, and for Rensselaer for providing meeting space and staffsupport.WealsothanktheProgramCommitteemembersfortheirthorough reviews. Reference [1] Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, NataliaKwasnikowska,SimonMiles,PaoloMissier,JimMyers,BethPlale,Yogesh Simmhan, Eric Stephan, and Jan Van den Bussche. The open provenance model core specification (v1.1). Future Generation Computer Systems, July 2010. (DOI: 10.1016/j.future.2010.07.005) (URL:http://eprints.ecs.soton.ac.uk/21449/) Organization IPAW 2010 was organized by the Tetherless World Constellation at Rensselaer Polytechnic Institute. Workshop Co-chairs Deborah L. McGuinness Rensselaer Polytechnic Institute, USA Luc Moreau University of Southampton, UK Program Committee Christian Bizer Freie Universita¨t Berlin, Germany James Cheney University of Edinburgh, UK Richard Cyganiak DERI, Ireland Susan Davidson University of Pennsylvania, USA Li Ding Rensselaer Polytechnic Institute, USA Ian Foster University of Chicago,USA Peter Fox Rensselaer Polytechnic Institute, USA Juliana Freire University of Utah, USA Alyssa Glass Stanford University, USA Paul Groth VrijeUniversiteitAmsterdam,TheNetherlands Olaf Hartig Universita¨t zu Berlin, Germany Michael Hausenblas DERI, Ireland Bertram Ludaescher University of California, Davis, USA Marta Mattoso UFRJ, Brazil Simon Miles Kings College, UK Paolo Missier University of Manchester, UK Jim Myers NCSA, USA Paulo Pinheiro da Silva University of Texas, El Paso, USA Beth Plale Indiana University, USA Satya Sahoo Wright State University, USA Yogesh Simmhan Microsoft Research, USA Kerry Taylor CSIRO, Australia Jan Van den Bussche Universiteit Hasselt, Belgium Evelyne Viegas Microsoft Research, USA Jun Zhao University of Oxford, UK Provenance Hackathon Chair Paul Groth VrijeUniversiteitAmsterdam,TheNetherlands X Organization Publication Chair James R. Michaelis Rensselaer Polytechnic Institute, USA Local Organizers Jacky Carley Rensselaer Polytechnic Institute, USA Li Ding Rensselaer Polytechnic Institute, USA Alvaro Graves Rensselaer Polytechnic Institute, USA Timothy Lebo Rensselaer Polytechnic Institute, USA James P. McCusker Rensselaer Polytechnic Institute, USA Poster/Demonstration Session Chairs Stephan Zednik Rensselaer Polytechnic Institute, USA Patrick West Rensselaer Polytechnic Institute, USA Sponsoring Institutions Microsoft Corporation,Redmond, WA, USA Rensselaer Polytechnic Institute, Troy, NY, USA Springer, New York, NY, USA STI Innsbruck, Innsbruck, Austria Table of Contents Keynotes On Provenance and Privacy ....................................... 1 Susan B. Davidson Papers The Provenanceof Workflow Upgrades ............................. 2 David Koop, Carlos E. Scheidegger, Juliana Freire, and Cla´udio T. Silva ApproachesforExploringandQueryingScientificWorkflowProvenance Graphs ......................................................... 17 Manish Kumar Anand, Shawn Bowers, Ilkay Altintas, and Bertram Luda¨scher Automatic Provenance Collection and Publishing in a Science Data Production Environment—EarlyResults ............................ 27 James Frew, Greg Jan´ee, and Peter Slaughter Leveraging the Open Provenance Model as a Multi-tier Model for Global Climate Research.......................................... 34 Eric G. Stephan, Todd D. Halter, and Brian D. Ermold Understanding Collaborative Studies through Interoperable Workflow Provenance...................................................... 42 Ilkay Altintas, Manish Kumar Anand, Daniel Crawl, Shawn Bowers, Adam Belloum, Paolo Missier, Bertram Luda¨scher, Carole A. Goble, and Peter M.A. Sloot Provenance of Software Development Processes ...................... 59 Heinrich Wendel, Markus Kunde, and Andreas Schreiber Provenance-Awarenessin R ....................................... 64 Chris A. Silles and Andrew R. Runnalls SAF: A Provenance-Tracking Framework for Interoperable Semantic Applications..................................................... 73 Evan W. Patton, Dominic Difranzo, and Deborah L. McGuinness Publishing andConsumingProvenanceMetadataonthe Webof Linked Data ........................................................... 78 Olaf Hartig and Jun Zhao
Description: