Preface This volume contains the proceedings of the 5th International Provenance and Anno- tation Workshop (IPAW), held during June 10–11, 2014 at the German Aerospace Center (DLR) in Cologne, Germany. For the first time, IPAW colocated with the WorkshopontheTheoryandPracticeofProvenance(TaPP).Togetherthetwoleading provenance workshops anchored ProvenanceWeek 2014, a full week of provenance- related activities that included a shared poster session, a panel on reproducibility in science,andtutorialsontheW3CPROVstandard,onprovenanceanalytics,andtheuses ofprovenanceincellbiology.Theweekwasroundedoutwithafternoon-longbirds-of- a-featheractivitiesaroundconstructingaprovenancerecordfromdatawhenprovenance was not collected in the first place, and benchmarking of provenance systems. This collection constitutes the peer-reviewed papers of IPAW 2014. These include 14 long papers which report in-depth the results of research around provenance and four extended abstracts that discuss tools and services that were presented in the form of a systemdemonstration.Finally,wehaveincluded20shortabstractsofthejointIPAW/ TaPPpostersession.Thefinalpapers,demos,andposterabstractswereselectedfroma total of 53 submissions. All full-length research papers and demo papers received a minimumofthreereviews. The papers of IPAW 2014 provided a glimpse into state-of-the-art research and practice around the capture, representation, and use of provenance. Since provenance often results in graphs, and large ones at that, several of the papers in this collection proposed abstract graph models and methods with well-defined properties, properties that can hold even when sanitized for potentially sensitive information. Tools are the focus of a number of papers in this collection; these are innovative software applica- tions that solve a particular problem and are evaluated experimentally. They are often converging on the W3C PROV model for provenance interchange. Some papers dis- cussed tools that enable provenance capture from software compilers, from web pub- lications, and from scripts, using existing audit logs, and employing both static and dynamic instrumentation. New methodologies for provenance aggregation and use appeared in the collection as well. We see the evaluation of a linked data approach to provenance publishing, the generation of documentation from provenance, and appli- cation of provenance to protect attribution in scientific discovery. Inclosing,wewouldliketothankthemembersoftheProgramCommitteefortheir thoughtful reviews, Dr. Andreas Schreiber (Local Chair) and Carina Haupt for their excellentorganizationofIPAWandProvenanceWeek2014atDLR,and—lastbutnot least—the authors and participants for making IPAW the stimulating and successful event that it was. December 2014 Bertram Ludäscher Beth Plale Organization Program Committee Ilkay Altintas University of California, San Diego, USA Khalid Belhajjame PSL, Université Paris-Dauphine, LAMSADE, France Shawn Bowers Gonzaga University, USA Adriane Chapman The MITRE Corporation, USA James Cheney University of Edinburgh, UK Susan Davidson University of Pennsylvania, USA Tom De Nies Ghent University - iMinds - Multimedia Lab, Belgium Kai Eckert University of Mannheim, Germany Juliana Freire NYU Polytechnic School of Engineering, USA James Frew Bren School / UCSB, USA Daniel Garijo Universidad Politécnica de Madrid, Spain Yolanda Gil USC/ISI, USA Paul Groth VU University Amsterdam, The Netherlands Trung Dong Huynh University of Southampton, UK H. V. Jagadish University of Michigan, USA David Koop NYU Polytechnic School of Engineering, USA Carl Lagoze University of Michigan School of Information, USA Timothy Lebo Rensselaer Polytechnic Institute, USA Qing Liu CSIRO, Australia Shiyong Lu Wayne State University, USA Bertram Ludäscher University of California, Davis, USA Tanu Malik University of Chicago, USA Marta Mattoso COPPE- Federal Univ. Rio de Janeiro, Brazil Deborah McGuinness Rensselaer Polytechnic Institute, USA Simon Miles King’s College London, UK Paolo Missier Newcastle University, UK Luc Moreau University of Southampton, UK Beth Plale Indiana University, USA Yogesh Simmhan Indian Institute of Science, India Curt Tilmes NASA GSFC, USA Jan Van Den Bussche Hasselt University and University of Limburg Contents Standardization of Provenance Models, Services, Representations ProvAbs: Model, Policy, and Tooling for Abstracting PROV Graphs. . . . . . . 3 Paolo Missier, Jeremy Bryans, Carl Gamble, Vasa Curcin, and Roxana Danger ProvGen: Generating Synthetic PROV Graphs with Predictable Structure. . . . 16 Hugo Firth and Paolo Missier Applications of Provenance Walking into the Future with PROV Pingback: An Application to OPeNDAP Using Prizms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Timothy Lebo, Patrick West, and Deborah L. McGuinness Provenance for Online Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Amir Sezavar Keshavarz, Trung Dong Huynh, and Luc Moreau Regenerating and Quantifying Quality of Benchmarking Data Using Static and Dynamic Provenance. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Devarshi Ghoshal, Arun Chauhan, and Beth Plale Provenance Management Architectures and Techniques noWorkflow: Capturing and Analyzing Provenance of Scripts . . . . . . . . . . . 71 Leonardo Murta, Vanessa Braganholo, Fernando Chirigati, David Koop, and Juliana Freire LabelFlow: Exploiting Workflow Provenance to Surface Scientific Data Provenance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Pinar Alper, Khalid Belhajjame, Carole A. Goble, and Pinar Karagoz Auditing and Maintaining Provenance in Software Packages. . . . . . . . . . . . . 97 Quan Pham, Tanu Malik, and Ian Foster Security and Privacy Implications of Provenance An Analytical Survey of Provenance Sanitization . . . . . . . . . . . . . . . . . . . . 113 James Cheney and Roly Perera A Provenance-Based Policy Control Framework for Cloud Services . . . . . . . 127 Mufajjul Ali and Luc Moreau VIII Contents Applying Provenance to Protect Attribution in Distributed Computational Scientific Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Luiz M.R. Gadelha Jr. and Marta Mattoso Provenance Discovery and Data Reproducibility Looking Inside the Black-Box: Capturing Data Provenance Using Dynamic Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Manolis Stamatogiannakis, Paul Groth, and Herbert Bos Generating Scientific Documentation for Computational Experiments Using Provenance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Adianto Wibisono, Peter Bloem, Gerben K.D. de Vries, Paul Groth, Adam Belloum, and Marian Bubak Computing Location-Based Lineage from Workflow Specifications to Optimize Provenance Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Saumen Dey, Sven Köhler, Shawn Bowers, and Bertram Ludäscher System Demonstrations Interrogating Capabilities of IoT Devices. . . . . . . . . . . . . . . . . . . . . . . . . . 197 Stanislav Beran, Edoardo Pignotti, and Peter Edwards A Lightweight Provenance Pingback and Query Service for Web Publications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Tom De Nies, Robert Meusel, Dominique Ritze, Kai Eckert, Anastasia Dimou, Laurens De Vocht, Ruben Verborgh, Erik Mannens, and Rik Van de Walle Provenance-Based Searching and Ranking for Scientific Workflows . . . . . . . 209 Víctor Cuevas-Vicenttín, Bertram Ludäscher, and Paolo Missier PROV-O-Viz - Understanding the Role of Activities in Provenance . . . . . . . 215 Rinke Hoekstra and Paul Groth Joint IPAW/TaPP Poster Session The Aspect-Oriented Architecture of the CAPS Framework for Capturing, Analyzing and Archiving Provenance Data. . . . . . . . . . . . . . . . . . . . . . . . . 223 Peer C. Brauer, Florian Fittkau, and Wilhelm Hasselbring Improving Workflow Design Using Abstract Provenance Graphs . . . . . . . . . 226 Tianhong Song, Saumen Dey, Shawn Bowers, and Bertram Ludäscher Contents IX Early Discovery of Tomato Foliage Diseases Based on Data Provenance and Pattern Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Diogo Nunes, Carlos Werly, Gizelle Kupac Vianna, and Sérgio Manuel Serra da Cruz Provenance in Open Data Entity-Centric Aggregation . . . . . . . . . . . . . . . . . 232 Fausto Giunghiglia and Moaz Reyad Enhancing Provenance Representation with Knowledge Based on NFR Conceptual Modeling: A Softgoal Catalog Approach. . . . . . . . . . . . 235 Sérgio Manuel Serra da Cruz and André Luiz de Castro Leal Provenance Storage, Querying, and Visualization in PBase. . . . . . . . . . . . . . 239 Víctor Cuevas-Vicenttín, Parisa Kianmajd, Bertram Ludäscher, Paolo Missier, Fernando Chirigati, Yaxing Wei, David Koop, and Saumen Dey Engineering Choices for Open World Provenance. . . . . . . . . . . . . . . . . . . . 242 M. David Allen, Adriane Chapman, and Barbara Blaustein Towards Supporting Provenance Gathering and Querying in Different Database Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Flavio Costa, Vítor Silva, Daniel de Oliveira, Kary A.C.S. Ocaña, and Marta Mattoso Provenance for Explaining Taxonomy Alignments. . . . . . . . . . . . . . . . . . . . 258 Mingmin Chen, Shizhuo Yu, Parisa Kianmajd, Nico Franz, Shawn Bowers, and Bertram Ludäscher Challenges for Provenance Analytics Over Geospatial Data . . . . . . . . . . . . . 261 Daniel Garijo, Yolanda Gil, and Andreas Harth Adaptive RDF Query Processing Based on Provenance . . . . . . . . . . . . . . . . 264 Marcin Wylot, Philippe Cudré-Mauroux, and Paul Groth Using Well-FoundedProvenanceOntologies toQueryMeteorological Data . . . 267 Thiago Silva Barbosa, Ednaldo O. Santos, Gustavo B. Lyra, and Sérgio Manuel Serra da Cruz Applying W3C PROV to Express Geospatial Provenance at Feature and Attribute Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Joan Masó, Guillem Closa, and Yolanda Gil ProvStore: A Public Provenance Repository. . . . . . . . . . . . . . . . . . . . . . . . 275 Trung Dong Huynh and Luc Moreau