ebook img

Provenance and Annotation of Data: International Provenance and Annotation Workshop, IPAW 2006, Chicago, IL, USA, May 3-5, 2006, Revised Selected Papers PDF

297 Pages·2006·7.59 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Provenance and Annotation of Data: International Provenance and Annotation Workshop, IPAW 2006, Chicago, IL, USA, May 3-5, 2006, Revised Selected Papers

Lecture Notes in Computer Science 4145 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen UniversityofDortmund,Germany MadhuSudan MassachusettsInstituteofTechnology,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA MosheY.Vardi RiceUniversity,Houston,TX,USA GerhardWeikum Max-PlanckInstituteofComputerScience,Saarbruecken,Germany Luc Moreau Ian Foster (Eds.) Provenance and Annotation of Data International Provenance andAnnotation Workshop IPAW 2006 Chicago, IL, USA, May 3-5, 2006 Revised Selected Papers 1 3 VolumeEditors LucMoreau UniversityofSouthampton Southampton,UK E-mail:[email protected] IanFoster ArgonneNationalLab UniversityofChicago Chicago,U.S.A. E-mail:[email protected] LibraryofCongressControlNumber:2006933370 CRSubjectClassification(1998):H.3,H.4,D.4,E.2,H.5,K.6,K.4 LNCSSublibrary:SL3–InformationSystemsandApplication,incl.Internet/Web andHCI ISSN 0302-9743 ISBN-10 3-540-46302-XSpringerBerlinHeidelbergNewYork ISBN-13 978-3-540-46302-3SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. SpringerisapartofSpringerScience+BusinessMedia springer.com ©Springer-VerlagBerlinHeidelberg2006 PrintedinGermany Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SPIN:11890850 06/3142 543210 Preface Provenanceis a well understood concept in the study of fine art, where it refers to the documented history of anartobject.Given that documented history,the objectattainsanauthoritythatallowsscholarstounderstandandappreciateits importance and context relative to other works. In the absence of such history, art objects may be treated with some skepticism by those who study and view them. Over the last few years, a number of teams have been applying this concept of provenance to data and information generated within computer systems. If the provenance of data produced by computer systems can be determined as it can for some works of art, then users will be able to understand (for example) how documents were assembled, how simulation results were determined, and how financial analyses were carried out. A key driver for this research has been e-Science. Reproducibility of results and documentation of method have always been important concerns in science, and today scientists of many fields (such as bioinformatics, medical research, chemistry,andphysics)seeprovenanceasamechanismthatcanhelprepeatsci- entificexperiments,verifyresults,andreproducedataproducts.Likewise,prove- nance offers opportunities for the business world,since it allows for the analysis of processes that led to results, for instance to check they are well-behaved or satisfy constraints; hence, provenance offers the means to check compliance of processes,onthe basisoftheiractualexecution.Indeed,increasingregulationof many industries (for example, financialservices)means that provenancerecord- ing is becoming a legal requirement. Annotationiscloselyrelatedtoprovenance.Endusersdomorethanproduce and consume data: they comment on data and refer to it and to the results of queries upon it. Annotationis therefore animportantaspect of communication. One user may want to highlight a point in data space for another to investigate further.Theymaywishtoannotatetheresultofaquerysuchthatsimilarqueries show the annotation.Such annotationsthen become valuable because they may also provide information about the origin of data. At the same time, we may wish to understand the provenance of annotations, which may be produced by people or programs. Hence, provenance and annotation offer complementary information that can help end-users understand how and why data products were derived. The International Provenance and Annotation Workshop (IPAW 2006) was a follow-up to workshops in Chicago in October 2002 and in Edinburgh in De- cember2003.Itbroughttogethercomputerscientistsanddomainscientistswith a common interest in issues of data provenance, process documentation, data derivation, and data annotation. IPAW 2006 was held on May 3-5, 2006 at the VI Preface UniversityofChicago’sGleacherCenterindowntownChicagoandwasattended by roughly 45 participants. We received 33 high quality submissions in response to the call for papers. All submissions were reviewed by 3 reviewers.Overall,26 papers were accepted for oral presentation: 4 papers, which came with consistent high ranking by reviewers,were awardedlonger presentation time and space in the proceedings; 4 papers had shorter presentations,whereas the remain 18 papers were regular. In addition, Juliana Freire (University of Utah) and Roger Barga (Microsoft Research) were invited to make keynote addresses. The workshopwas organizedas a single-trackevent,interleavingformalpre- sentationand discussions. It also included an entertaining “Gong Show” of out- landish,“outside-the-box”ideas.Slides andpresentationmaterialscanbe found at http://www.ipaw.info/ipaw06/programme.html. In a discussion session, the participants debated whether the time was right to propose standard models of provenance or standard interfaces for recording, querying, and administering provenance stores. An outcome of this discussion was that participants agreed on steps to set up a “Provenance Challenge,” to include a provenance-tracking scenario and query-based evaluation; this chal- lenge will enable different groups to test and compare their approaches for this scenario (cf. http://twiki.ipaw.info/bin/view/Challenge). Overall this was a very successful event, with much enthusiastic discussion, many new ideas and insights generated, and a good community spirit. It was agreed that we should hold another of these events in about 18 months’ time. June 2006 Luc Moreau and Ian Foster Organization IPAW2006wasorganizedby the Universityof Chicago,ArgonneNationalLab- oratory,and the University of Southampton. Program Committee Dave Berry, National e-Science Centre, UK Peter Buneman, University of Edinburgh, UK Ian Foster (co-chair), Argonne National Lab/University of Chicago, USA James Frew, University of California, USA Jim Hendler, University of Maryland, USA Carole Goble, University of Manchester, UK Reagan Moore, San Diego Supercomputer Center, USA Luc Moreau (co-chair), University of Southampton, UK Jim Myers, National Center for Supercomputing Applications, USA York Sure, University of Karlsruhe, Germany Ziga Turk, University of Ljubljana, Slovenia Mike Wilde, Argonne National Lab/University of Chicago, USA Hai Zhuge, Institute of Computing Technology, Chinese Academy of Sciences, China Sponsors IPAW 2006 was sponsored by Springer and Microsoft, and endorsed by the Global Grid Forum. Table of Contents Session 1: Keynotes Automatic Generation of Workflow Provenance ....................... 1 Roger S. Barga, Luciano A. Digiampietri Managing Rapidly-Evolving Scientific Workflows...................... 10 Juliana Freire, Cl´audio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, Huy T. Vo Session 2: Applications Virtual Logbooks and Collaboration in Science and Software Development ..................................................... 19 Dimitri Bourilkov, Vaibhav Khandelwal, Archis Kulkarni, Sanket Totala Applying Provenancein Distributed Organ Transplant Management..... 28 Sergio A´lvarez, Javier Va´zquez-Salceda, Tama´s Kifor, L´aszlo´ Z. Varga, Steven Willmott Provenance Implementation in a Scientific Simulation Environment...... 37 Guy K. Kloss, Andreas Schreiber Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering .................................................. 46 Nithya N. Vijayakumar, Beth Plale Enabling Provenanceon Large Scale e-Science Applications ............ 55 Miguel Branco, Luc Moreau Session 3: Discussion Session 4: Semantics 1 Harvesting RDF Triples ........................................... 64 Joe Futrelle Mapping Physical Formats to Logical Models to Extract Data and Metadata: The Defuddle Parsing Engine ............................. 73 Tara D. Talbott, Karen L. Schuchardt, Eric G. Stephan, James D. Myers X Table of Contents Annotation and Provenance Tracking in Semantic Web Photo Libraries................................................... 82 Christian Halaschek-Wiener, Jennifer Golbeck, Andrew Schain, Michael Grove, Bijan Parsia, Jim Hendler Metadata Catalogs with Semantic Representations .................... 90 Yolanda Gil, Varun Ratnakar, Ewa Deelman Combining Provenance with Trust in Social Networks for Semantic Web Content Filtering............................................. 101 Jennifer Golbeck Session 5: Workflow Recording Actor State in Scientific Workflows ........................ 109 Ian Wootten, Omer Rana, Shrija Rajbhandari Provenance Collection Support in the Kepler Scientific Workflow System ................................................. 118 Ilkay Altintas, Oscar Barney, Efrat Jaeger-Frank A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows ....................................................... 133 Shawn Bowers, Timothy McPhillips, Bertram Luda¨scher, Shirley Cohen, Susan B. Davidson Applying the Virtual Data ProvenanceModel ........................ 148 Yong Zhao, Michael Wilde, Ian Foster Session 6: Models of Provenance, Annotations and Processes A Provenance Model for Manually Curated Data...................... 162 Peter Buneman, Adriane Chapman, James Cheney, Stijn Vansummeren Issues in Automatic Provenance Collection ........................... 171 Uri Braun, Simson Garfinkel, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Margo I. Seltzer Electronically Querying for the Provenance of Entities ................. 184 Simon Miles Table of Contents XI AstroDAS: Sharing Assertions Across Astronomy Catalogues Through Distributed Annotation ............................................ 193 Rajendra Bose, Robert G. Mann, Diego Prina-Ricotti Session 7: Gong Show Session 8: Systems Security Issues in a SOA-Based Provenance System ................... 203 Victor Tan, Paul Groth, Simon Miles, Sheng Jiang, Steve Munroe, Sofia Tsasakou, Luc Moreau Implementing a Secure Annotation Service ........................... 212 Imran Khan, Ronald Schroeter, Jane Hunter Performance Evaluation of the Karma Provenance Framework for Scientific Workflows ............................................... 222 Yogesh L. Simmhan, Beth Plale, Dennis Gannon, Suresh Marru Exploring Provenance in a Distributed Job Execution System .......... 237 Christine F. Reilly, Jeffrey F. Naughton gLite Job Provenance.............................................. 246 Frantiˇsek Dvoˇra´k, Daniel Kouˇril, Aleˇs Kˇrenek, Ludˇek Matyska, Miloˇs Mulaˇc, Jan Posp´ıˇsil, Miroslav Ruda, Zdenˇek Salvet, Jiˇr´ı Sitera, Michal Voc˚u Session 9: Semantics 2 An Identity Crisis in the Life Sciences ............................... 254 Jun Zhao, Carole Goble, Robert Stevens CombeChem: A Case Study in Provenance and Annotation Using the Semantic Web ........................................... 270 Jeremy Frey, David De Roure, Kieron Taylor, Jonathan Essex, Hugo Mills, Ed Zaluska Principles of High Quality Documentation for Provenance: A Philosophical Discussion ......................................... 278 Paul Groth, Simon Miles, Steve Munroe Session 10: Final Discussion Author Index................................................... 287 Automatic Generation of Workflow Provenance Roger S. Barga1 and Luciano A. Digiampietri2 1 Microsoft Research, One Microsoft Way Redmond, WA 98052, USA 2 Institute of Computing, University of Campinas, Sao Paolo, Brazil [email protected] Abstract. While workflow is playing an increasingly important role in e- Science, current systems lack support for the collection of provenance data. We argue that workflow provenance data should be automatically generated by the enactment engine and managed over time by an underlying storage service. We briefly describe our layered model for workflow execution provenance, which allows navigation from the conceptual model of an experiment to instance data collected during a specific experiment run, and back. 1 Introduction Many scientific disciplines today are data and information driven, and new scientific knowledge is often obtained by scientists scheduling data intensive computing tasks across an ever-changing set of computing resources. Scientific workflows represent the logical culmination of this trend. They provide the necessary abstractions that enable effective usage of computational resources, and the development of robust problem-solving environments that marshal both data and computing resources. Part of the established scientific method is to create a record of the origin of a result, how it was obtained, experimental methods used, the machines, calibrations and parameter settings, quantities, etc. It is the same with scientific workflows, except here the result provenance is a record of workflow activities invoked, services and databases used, their versions and parameter settings, data sets used or generated and so forth. Without this provenance data, the results of a scientific workflow are of limited value. Though the value of result provenance is widely recognized, today most workflow management systems generate only limited provenance data. Consequently, this places the burden on the user to manually record provenance data in order to make an experiment or result explainable. Once the experiment is finished, the specific steps leading to the result are often stored away in a private file or notebook. However, weeks or months may pass before the scientist realizes a result was significant, by which time provenance has been forgotten or lost – it simply slips through the cracks. We believe the collection of provenance data should be automatic, and the resulting provenance data should be managed by the underlying system. Moreover, a robust provenance trace should support multiple levels of representations. In some cases, it is suitable to provide an abstract description of the scientific process that took place, without describing specific codes executed, the data sets or remote services L. Moreau and I. Foster (Eds.): IPAW 2006, LNCS 4145, pp. 1 – 9, 2006. © Springer-Verlag Berlin Heidelberg 2006

Description:
Provenance is a well understood concept in the study of ?ne art, where it refers to the documented history of an art object. Given that documented history, the objectattains anauthority that allows scholarsto understandand appreciateits importance and context relative to other works. In the absence
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.