. ,, .......... .. ···············. / { ······· .................. ) ,: ··· ·., ............ ~ ~:··· .) ,,• / .:::::,:::::········ / .. ··················· .... ' : / ~ ••••••••• 1 •••••••• ~-::-,· •• .. ······ ... . ·~\ · i -: ·.) -, ·····~ \..... ..· ............. ·· ISAGE Los Angeles I London I New Delhi Singapore I Washington DC , , •• u, .., .. ISAGE Los Angeles I London I New Delhi Singapore I Washington DC SAGE Publications Ltd © Louise Corti, Veerle Van den Eynden, Libby Bishop, Matthew 1 Oliver's Yard Woollard 2014 55 City Road London EC1Y 1SP First published 2014 SAGE Publications Inc. Apart from any fair dealing for the purposes of research 2455 Teller Road or private study, or criticism or review, as permitted under Thousand Oaks, California 91320 the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by SAGE Publications India Pvt Ltd any means, only with the prior permission in writing of the B 1/1 1 Mohan Cooperative Industrial Area publishers, or in the case of reprographic reproduction, in Mathura Road accordance with the terms of licences issued by the Copyright New Delhi 110 044 Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. SAGE Publications Asia-Pacific Pte Ltd 3 Church Street #10-04 Samsung Hub Singapore 049483 Editor: Katie Metzler Libraryo f Congress ControlN umber: 2013945867 Production editor: Ian Antcliff Copyeditor: Christine Bitten BritishL ibraryC ataloguing In Publication data Proofreader: Kate Harrison Marketing manager: Sally Ransom A catalogue record for this book is available from Cover design: Francis Kenney the British Library Typeset by: C&M Digitals (P) Ltd, Chennai, India Printed and bound by CPI Group (UK) Ltd, Croydon, CAO 4YY ISBN 978-1-4462-6725-7 ISBN 978-1-4462-6726-4 (pbk) JAMES C(' ··' Nl\Jf"D . ,, lJ ' '\ \1 \I \\j Contents About the Authors vi Acknowledgements vii Prefa ce viii 1 The Importance of Managing and Sharing Research Data 1 2 The Research Data Lifecycle 1 7 3 Research Data Management Planning 24 4 Documenting and Providing Context for Data 38 5 Formatting and Organizing Data 57 6 Storing and Transferring Data 86 7 Legal and Ethical Issues in Sharing Data 107 8 Rights Relating to Research Data 143 9 Collaborative Research: Data Management Strategies for Research Teams and Research Managers 158 10 Making Use of Other People's Research Data: Opportunities and Limitations 169 11 Publishing and Citing Research Data 197 Conclusion 214 Glossary of Abbreviations 216 Index 218 About the Authors Louise Corti is an Associate Director of the UK Data Archive and Head of the Producer Relations and Collections Development teams. She also leads quali tative data activities at the Archive and directed UK Qualidata from 1998. She has 25 years of expertise in archiving, sharing and using social science data, and has particular expertise in the challenges of managing and sharing research data, standards for archiving qualitative data, and teaching with data. She has held numerous research awards in these areas focusing on best practice, train ing and tools. Veerle Van den Eynden manages the Research Data Management team at the UK Data Archive. This team provides expertise, guidance and training on data management and data sharing to researchers, to promote good data practices and optimise data sharing. She leads research and development projects on various research data management aspects. Veerle has many years of experience researching interactions between people, plants and the environment, using a combination of social and natural science methods, and has experienced first hand the benefits that data sharing brings to research. Libby Bishop is a Manager in the Research Data Management team at the UK Data Archive where she provides guidance, support and training on data man agement, data management planning and data sharing to researchers and data producers. She has particular expertise in ethics and data sharing, including informed consent for archiving data and the ethics of re-using data. She also develops and delivers training for the User Support and Training section of the UK Data Service, with a focus on secondary analysis of qulitative data. Matthew Woollard is Director of the UK Data Archive. He has practical and theoretical experience in all aspects of data service infrastructure, providing lead ership in data curation, archiving and preservation activities. From 2002-2006 he was the Head of the History Data Service and from 2006-2010 an Associate Director and Head of Digital Preservation and Systems at the UK Data Archive. He currently provides leadership and strategic direction of the both the UK Data Archive and the ESRC-funded UK Data Service. Acknowledgements The authors would first like to thank the UK's Economic and Social Research Council (ESRC) for providing core funding for the UK Data Service, and its previous incarnations, since 1967. This continuity of support has enabled us to build one of the world's best national archiving centres and infrastructures, and to attract highly skilled researchers and support staff. This funding has also ena bled us to play a leading role in developing proactive and capacity-building approaches to help researchers achieve better data curation practices. We would equally like to thank Jisc for providing more recent funding opportunities on research data management, from which we have benefited. These have inspired awareness and implementation of better local research data management prac tices, policies and infrastructure. We thank Camille Corti-Georgiou, Sue Wood and Anne Etheridge for their help in proofing and preparing this manuscript. Other UK Data Archive staff members have contributed to specific exercises which have been adapted for this book: Laurence Horton, Bethany Morgan-Brett and Mus Ahmet. We express our warmest thanks to the late Dr Alasdair Crockett, who co authored the very first UK Data Archive Managing and Sharing Data guide back in 2006. Finally thank you to all of our partners, Rob Lenart, Nick Snow, Allen Radtke and Penny Woollard, for bearing with our often highly antisocial hours spent writing, revising and editing. Preface Researchers' responsibilities towards their research data are set to change across all domains of scientific endeavour. Research funders are increasingly mandating open access to research data; governments internationally are demanding trans parency in research; the economic climate is requiring much greater reuse of data; and fear of data loss calls for more robust information security practices. All these factors mean that researchers will need to improve, enhance and professionalize their research data management skills to meet the challenge of producing the highest quality shareable and reusable research outputs in a responsible and effi cient way. The promotion of these skills offers a strategic contribution to the UK's and other countries' research capacity building programmes. Robust research data management techniques give researchers and data pro fessionals the skills required to deal with the rapid, and uneven, developments in the data management environment. Research funders in the UK, USA and across Europe are gradually implementing data management (and sharing) poli cies in order to maximize openness of data, transparency and accountability of research they support. Journal publishers increasingly require submission of the data upon which publications are based for peer review. Research funders and data users recognize the long-term value of well-prepared data. And institutions need good quality research information infrastructure to manage the ethical and security risks of their data assets. This book contains up-to-date, easy-to-digest information about managing and sharing research data. It covers the full range of research data skills that feature in the research data lifecycle and includes case studies and practical activities that can assist in understanding the core concepts and applications. The detailed guidance presented results from many years of delivering best practice advice, guidance and training to a wide range of researchers. This has involved in-depth discussion around topical research projects and from working closely with researchers on developing solutions and tools that fit easily within standard research practices. We define research data as any research materials resulting from primary data collection or generation, qualitative or quantitative, or derived from existing sources intended to be analysed in the course of a research project. The scope covers numerical data, textual data, digitized materials, images, recordings or modelling scripts. ~ The book is aimed at researchers at all levels, from the novice Masters or PhD student to the experienced professor or research group leader. For the junior .. PREFACE ix researcher the material helps to provide building blocks for a solid research knowledge base. For the experienced researcher the information can help plug some of the gaps in current knowledge; can refresh and update their knowledge in areas of rapid technological change, and can help them stay abreast of changes in legislation relating to the governance of research data or the ethics of research. It can also help reassure professional researchers that their practices are consist ent with best practices. While this book has been written with the social science researcher in mind, many of the issues and practical strategies we set out are applicable more widely to any researcher creating or working with data. The book is also aimed at the rapidly increasing group of research support staff, both in academic institutions and government departments, tasked with aspects of managing and sharing data, be it research grant and research govern ance, ethical review, IT services staff or library support. While the information set out in this book is not necessarily sequential, we recommend reading the chapters in the order presented, so that knowledge is built incrementally. The exercises are intended to help consolidate knowledge and completing them will enable the reader to feel more confident in putting the skills into practice in a real-life research environment. The following chapters of this handbook describe the key elements of data management that are essential in enabling the safe handling and sharing of research data. Chapter 1 introduces the key drivers for the practices of sharing data that have emerged, and considers arguments for why it is beneficial to share data and, more practically, how data can be shared. In Chapter 2 we introduce the concept of the research data lifecycle and how this extends the typical cycle of research. Chapter 3 deals with research data management planning, using a data management checklist, assigning roles and responsibilities, costing data management into a research project, and the resourcing of data management during research. Chapter 4 describes documenting and providing context and provenance information for quantitative and qualitative data, examining in detail how to describe studies and files in a collection of research data. In Chapter 5 we cover formatting and organizing data, including file formats, data conversion, the organizing of files and folders, data quality assurance, version control and authenticity, data transcription and data digitization. Chapter 6 discusses the storing and transferring of data, setting out best practices in backing up data, providing information security, data transmission and encryption, data disposal, working with file-sharing and collaborative environments, and the long-term storage and preservation of data. In Chapter 7 we discuss research ethics and privacy, dealing with legal and ethical issues that are relevant for data sharing, and consider pathways for pro viding access to data. These include informed consent, statistical disclosure con trol, anonymization of data and use of access control measures. Chapter 8 introduces Intellectual Property Rights in data, including copyright, database and other rights and the (re)use of existing data sources in compliance with such rights. Chapter 9 recommends data strategies for collaborative research, developing x PREFACE standard operating protocols, procedures and shared resources, coordinating data records and assigning data roles and responsibilities in a team. Chapter l O covers how to make use of other people's research data, looking at opportunities for and challenges of reusing data, and provides six real-life case studies of data reuse. Finally, in Chapter 11 we examine publishing and citing data, including where to publish data and how to create citable data through the use of persis tent digital identifiers. Finally, the book has an accompanying website where additional exercises can be found plus links from the UK Data Service. The URL is: http://ukdataservice. ac. uk/manage-data/handbook ONE The Importance of Managing and Sharing Research Data Research data are the cornerstone of scientific knowledge, learning and innovation and of our quest to understand, explain and develop humanity and the world around us. In the digital age, the generation of research data has not only grown exponentially, but data are nowadays very easily stored, kept and exchanged around the world.T he demand for ensuring that the benefits of technological advances are employed to modernize how we treat and utilize research data is growing by the day. Over 50 years ago, Watson and Crick (1953) published the structure of DNA in a single page article in Nature, with no raw data to underpin their findings. Recently, The 1000 Genomes Project Consortium (2010) accompanied their publication in Nature with 4.9 terabases of DNA sequences available through the project website and deposited in dbSNP, the database of single nucleotide polymorphisms (Kiermer, 2011). Genetics research is just one example that shows how the openness and exchange of information, including research data, can drive up rapidly the speed of research and discovery to our advantage. Just consider the medical benefits of our growing genetic understanding. The period from 2000 has seen a boom in both the drivers of data sharing, as well as the development of human and material capability to do so. Research funders are increasingly mandating easy and/or open access to research data, and data plans to ensure maximum quality, sustainability, accessibility and openness of research data. Publishers of academic findings demand that the supporting data can be accessed for scrutiny or further exploration. Governments interna tionally are demanding transparency in research and the economic climate makes it desirable for much greater reuse of data to maximize the return on science investments. Many researchers themselves agree that lack of access to data impedes scientific progress. Access to data means that scientific findings can be verified and scrutinized if needed. Society demands access to data: to enable businesses to employ new knowledge for the development of tools and applications; to allow organiza tions to question governmental policies and decisions; and for thousands of citizens to engage in research processes, or 'citizen science', to advance our col lective scientific knowledge. 2 MANAGING AND SHARING RESEARCH DATA Researchers' responsibilities towards their research data are therefore chang ing across all domains of scientific endeavour. Researchers need to improve, enhance and professionalize their research data management skills to meet the challenge of producing the highest quality research outputs and sustainable data in a responsible and efficient way, with the ability to share and reuse such out puts. By data management we mean all data practices, manipulations, enhance ments and processes that ensure that research data are of a high quality, are well organized, documented, preserved, sustainable, accessible and reusable. The promotion of data skills offers a strategic contribution to the UK's and other countries' research capacity building programmes. And institutions need high quality research data management to address the ethical and security risks of their data assets. Robust research data management techniques give researchers, data professionals and those involved in supporting research the skills that are required to deal with the rapid, and uneven, developments in the data manage ment environment. The Data Sharing Agenda Researchers have always understood the importance of sharing: sharing findings in scientific publications; sharing expertise through peer networking; and col laboration via learned societies. Technological advances allow this sharing to be accelerated to a new level and applied in different ways: through open access to research publications and also to research data, tools, software and educational resources. The early 1990s saw a call for the opening up of published research articles to be available online, later coming to include a greater range of primary research materials. Key drivers in the acceleration towards the opening up of research data have been the OECD Principles and Guidelines for Access to Research Data from Public Funding and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. The Organization for Economic Cooperation and Development (OECD) principles declared that publicly funded research data are a public good, produced in the public interest, and that it should be made openly available with as few restrictions as possible in a timely and responsible manner without harming intellectual property (OECD, 2007). The Berlin Declaration called for promoting knowledge dissemination through the open access paradigm via the internet, which requires the worldwide web to be sustainable, interactive and transparent, with openly accessible and compatible content and tools (Berlin Declaration, 2003). At a European level the report of the High Level Expert Group on Scientific Data, noting the rising tide of data, proposed that we are on the verge of a great new leap in scientific capability, fuelled by data, with a need for a scientific e-infrastructure that supports seamless access, use, reuse and trust of data (European Commission, 2010). The report sketches the benefits~ and costs of accelerating the development of a fully functional e-infrastructure for scientific data. Open infrastructure, open culture and open content need to go hand in hand.