Lecture Notes in Computer Science 5272 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA AlfredKobsa UniversityofCalifornia,Irvine,CA,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen UniversityofDortmund,Germany MadhuSudan MassachusettsInstituteofTechnology,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA GerhardWeikum Max-PlanckInstituteofComputerScience,Saarbruecken,Germany Juliana Freire David Koop Luc Moreau (Eds.) Provenance and Annotation of Data and Processes Second International Provenance andAnnotationWorkshop IPAW 2008 Salt Lake City, UT, USA, June 17-18, 2008 Revised Selected Papers 1 3 VolumeEditors JulianaFreire DavidKoop UniversityofUtah,SchoolofComputing SaltLakeCity,UT84112,USA E-mail:{juliana,[email protected]} LucMoreau UniversityofSouthampton SchoolofElectronicsandComputerScience SouthhamptonSO171BJ,UK E-mail:[email protected] LibraryofCongressControlNumber:2008940592 CRSubjectClassification(1998):H.3,H.4,D.4,E.2,H.5,K.6,K.4 LNCSSublibrary:SL3–InformationSystemsandApplication, incl.Internet/WebandHCI ISSN 0302-9743 ISBN-10 3-540-89964-2SpringerBerlinHeidelbergNewYork ISBN-13 978-3-540-89964-8SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. springer.com ©Springer-VerlagBerlinHeidelberg2008 PrintedinGermany Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SPIN:12569199 06/3180 543210 Preface Computinghasbeenanenormousacceleratortoscienceandindustryalikeandit has led to an information explosion in many different fields. The unprecedented volume of data acquired from sensors, derived by simulations and data analysis processes, accumulated in warehouses, and often shared on the Web, has given risetoanewfieldofresearch:provenancemanagement.Provenance(alsoreferred to as audit trail, lineage, and pedigree) captures information about the steps used to generate a given data product. Such information provides important documentation that is key to preserving data, to determining the data’s quality and authorship, to understanding, reproducing, as well as validating results. Provenancemanagementhasbecomeanactivefieldofresearch,asevidenced byrecentspecializedworkshops,surveys,andtutorials.Provenancesolutionsare needed in many different domains and applications,from environmentalscience and physics simulations, to business processes and data integration in ware- houses. Not surprisingly, different techniques and provenance models have been proposedin many areassuchas workflowsystems,visualization,databases,dig- ital libraries,and knowledge representation.An important challenge we face to- dayishowtointegratethesetechniquesandmodelssothatcompleteprovenance can be derived for complex data products. TheInternationalProvenanceandAnnotationWorkshop(IPAW2008)wasa follow-upto previousworkshopsinChigago(2006,2002)andEdinburgh(2003). It was held during June 17–18, in Salt Lake City, at the University of Utah campus. IPAW 2008 brought together computer scientists from different areas and provenance users to discuss open problems related to the provenance of computational and non-computational artifacts. A total of 55 people attended the workshop. We received 40 submissions in response to the call for papers. Each sub- mission was reviewed by at least three reviewers. Overall, 14 submissions were acceptedasfullpapers,and15wereacceptedasshortpapersanddemos.Allac- ceptedpapers,shortpapers,anddemoswereinvitedfororalpresentationatthe workshop.ValTannen(UniversityofPennsylvania)andAllenBrown(Microsoft Research) gave keynote addresses. The workshop was organized as a single track event with paper, poster, and demo sessions interleaved. Slides and presentation materials can be found at http://www.sci.utah.edu/ipaw2008/agenda.html. Immediately following IPAW 2008,a groupof 22 researchers got together to discuss a proposal for an Open Provenance Model (OPM). The aim of OPM is to allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance data model. The dis- cussions were technical in nature, and touched upon fundamental notions such as provenance graphs, agents, alternate accounts, and permissible inferences. VI Preface A summary can be found at http://twiki.ipaw.info/bin/view/Challenge/ FirstOPMWork shopMinutes.As a result of this workshop,a new versionofthe OPMwasreleased,whichisintendedtobeusedinthenextinter-operabilityex- ercise, the Third Provenance Challenge. Material related to OPM can be found at http://openprovenance.org/. IPAW 2008 was a very successful event with much enthusiastic discussion and many new ideas generated. We thank MicrosoftResearchfor sponsoringthe workshopbanquet. We also thank the ProgramCommittee members for their thorough reviews. Juliana Freire Luc Moreau Organization IPAW 2008 was organized by the Department of Computer Science, University of Utah. Workshop Co-chairs Juliana Freire University of Utah, USA Luc Moreau University of Southampton, UK Program Committee Roger Barga Microsoft Research, USA Ken Brodlie University of Leeds, UK Peter Buneman University of Edinburgh, UK James Cheney University of Edinburgh, UK Min Chen Swansea University, UK Susan Davidson University of Pennsylvania, USA Paul Groth ISI, USA Beth Plale Indiana University, USA Carole Goble University of Manchester, UK Ian Foster University of Chicago, USA Juliana Freire University of Utah, USA Bertram Ludascher UC Davis, USA H. V. Jagadish University of Michigan, USA Marta Mattoso UFRJ, Brazil Simon Miles King’s College, UK Luc Moreau University of Southampton, UK Jim Myers NCSA, USA Allen Renear University of Illinois at Urbana-Champaign,USA Margo Seltzer Harvard University, USA Claudio Silva University of Utah, USA Wang-Chiew Tan UC Santa Cruz, USA Jan Van den Bussche Universiteit Hasselt, Belgium Stijn Vansummeren Universiteit Hasselt, Belgium Daniel J. Weitzner W3C Web Co-chairs Erik Jorgensen University of Utah, USA Tommy Ellkvist Linko¨ping University, Sweden VIII Organization Local Organizers David Koop University of Utah, USA Emanuele Santos University of Utah, USA Sponsoring Institutions Microsoft Corporation,Redmond, WA, USA Scientific Computing and Imaging Institute, University of Utah, USA Springer, New York, NY, USA University of Utah, Salt Lake City, UT, USA Table of Contents Keynotes Provenance for Database Transformations........................... 1 Val Tannen Enforcing the Scientific Method.................................... 2 Allen L. Brown, Jr. Papers Mapping the NRC Dataflow Model to the Open Provenance Model..... 3 Natalia Kwasnikowska and Jan Van den Bussche Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements ......................................... 17 Paolo Missier, Khalid Belhajjame, Jun Zhao, Marco Roos, and Carole Goble A Logic Programming Approach to Scientific Workflow Provenance Querying ....................................................... 31 Yong Zhao and Shiyong Lu Recording the Context of Action for Process Documentation .......... 45 Ian Wootten and Omer Rana User-Centric Annotation Management for Biological Data............. 54 Qinglan Li, Alexandros Labrinidis, and Panos K. Chrysanthis A Model for Sharing of Confidential Provenance Information in a Query Based System ............................................. 62 Meiyappan Nagappan and Mladen A. Vouk Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life ....................................... 70 Shawn Bowers, Timothy McPhillips, Sean Riddle, Manish Kumar Anand, and Bertram Luda¨scher Using Visualization Process Graphs to Improve Visualization Exploration ..................................................... 78 T.J. Jankun-Kelly Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures .......................... 92 Zheng Chen and Luc Moreau X Table of Contents Provenance and the Price of Identity ............................... 106 Adriane Chapman and H.V. Jagadish Towards Provenance-EnablingParaView............................ 120 Steven P. Callahan, Juliana Freire, Carlos E. Scheidegger, Cla´udio T. Silva, and Huy T. Vo Application of Provenance for Automated and Research Driven Workflows ...................................................... 128 Tara Gibson, Karen Schuchardt, and Eric Stephan Using Provenance to Improve Workflow Design ...................... 136 Frederico T. de Oliveira, Leonardo Murta, Claudia Werner, and Marta Mattoso Job Provenance – Insight into Very Large Provenance Datasets: Software Demonstration .......................................... 144 Aleˇs Kˇrenek, Ludˇek Matyska, Jiˇr´ı Sitera, Miroslav Ruda, Frantiˇsek Dvoˇra´k, Jiˇr´ı Filipoviˇc, Zdenˇek Sˇustr, and Zdenˇek Salvet A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows ...................................................... 152 Daniel Crawl and Ilkay Altintas A First Study on Clustering Collections of Workflow Graphs .......... 160 Emanuele Santos, Lauro Lins, James P. Ahrens, Juliana Freire, and Cla´udio T. Silva Exploiting Provenance to Make Sense of Automated Decisions in Scientific Workflows .............................................. 174 Paolo Missier, Suzanne Embury, and Richard Stapenhurst Using Explicit Control Processes in Distributed Workflows to Gather Provenance...................................................... 186 S´ergio Manuel Serra da Cruz, Fernando Seabra Chirigati, Rafael Dahis, Maria Luiza M. Campos, and Marta Mattoso ES3: A Demonstration of Transparent Provenance for Scientific Computation .................................................... 200 James Frew and Peter Slaughter Neuroimaging Data Provenance Using the LONI Pipeline Workflow Environment .................................................... 208 Allan J. MacKenzie-Graham, Arash Payan, Ivo D. Dinov, John D. Van Horn, and Arthur W. Toga Provenance Tracking in an Earth Science Data Processing System...... 221 Curt Tilmes and Albert J. Fleig Table of Contents XI A Python Library for Provenance Recording and Querying............ 229 Carsten Bochner, Roland Gude, and Andreas Schreiber Requirements for a Provenance Visualization Component ............. 241 Markus Kunde, Henning Bergmeyer, and Andreas Schreiber Advances and Challenges for Scalable Provenancein StreamProcessing Systems ........................................................ 253 Archan Misra, Marion Blount, Anastasios Kementsietsidis, Daby Sow, and Min Wang Using Provenance to Support Real-Time Collaborative Design of Workflows ...................................................... 266 Tommy Ellkvist, David Koop, Erik W. Anderson, Juliana Freire, and Cla´udio Silva Provenance in Sensornet Republishing .............................. 280 Unkyu Park and John Heidemann Semantically-Enhanced Model-Experiment-Evaluation Processes (SeMEEPs) within the Atmospheric Chemistry Community ........... 293 Chris Martin, Mohammed H. Haji, Peter Dew, Mike Pilling, and Peter Jimack Oceanographic Data Provenance Tracking with the Shore Side Data System ......................................................... 308 Michael McCann and Kevin Gomes Invited Contribution The Open Provenance Model: An Overview ......................... 323 Luc Moreau, Juliana Freire, Joe Futrelle, Robert E. McGrath, Jim Myers, and Patrick Paulson Author Index.................................................. 327
Description: