Table Of Content.
,,
..........
.. ···············.
/
{
·······
..................
)
,: ···
·.,
............ ~ ~:···
.) ,,•
/
.:::::,:::::········
/ .. ···················
....
'
: /
~
••••••••• 1 •••••••• ~-::-,· ••
.. ······ ...
.
·~\
· i -: ·.)
-,
·····~ \..... ..·
............. ··
ISAGE
Los Angeles I London I New Delhi
Singapore I Washington DC
, , •• u, .., ..
ISAGE
Los Angeles I London I New Delhi
Singapore I Washington DC
SAGE Publications Ltd © Louise Corti, Veerle Van den Eynden, Libby Bishop, Matthew
1 Oliver's Yard Woollard 2014
55 City Road
London EC1Y 1SP First published 2014
SAGE Publications Inc. Apart from any fair dealing for the purposes of research
2455 Teller Road or private study, or criticism or review, as permitted under
Thousand Oaks, California 91320 the Copyright, Designs and Patents Act, 1988, this publication
may be reproduced, stored or transmitted in any form, or by
SAGE Publications India Pvt Ltd any means, only with the prior permission in writing of the
B 1/1 1 Mohan Cooperative Industrial Area publishers, or in the case of reprographic reproduction, in
Mathura Road accordance with the terms of licences issued by the Copyright
New Delhi 110 044 Licensing Agency. Enquiries concerning reproduction outside
those terms should be sent to the publishers.
SAGE Publications Asia-Pacific Pte Ltd
3 Church Street
#10-04 Samsung Hub
Singapore 049483
Editor: Katie Metzler
Libraryo f Congress ControlN umber: 2013945867
Production editor: Ian Antcliff
Copyeditor: Christine Bitten
BritishL ibraryC ataloguing In Publication data
Proofreader: Kate Harrison
Marketing manager: Sally Ransom
A catalogue record for this book is available from
Cover design: Francis Kenney
the British Library
Typeset by: C&M Digitals (P) Ltd, Chennai, India
Printed and bound by CPI Group (UK) Ltd,
Croydon, CAO 4YY
ISBN 978-1-4462-6725-7
ISBN 978-1-4462-6726-4 (pbk)
JAMES C(' ··'
Nl\Jf"D . ,,
lJ
' '\ \1 \I \\j
Contents
About the Authors vi
Acknowledgements vii
Prefa ce viii
1 The Importance of Managing and Sharing Research Data 1
2 The Research Data Lifecycle 1 7
3 Research Data Management Planning 24
4 Documenting and Providing Context for Data 38
5 Formatting and Organizing Data 57
6 Storing and Transferring Data 86
7 Legal and Ethical Issues in Sharing Data 107
8 Rights Relating to Research Data 143
9 Collaborative Research: Data Management Strategies for Research
Teams and Research Managers 158
10 Making Use of Other People's Research Data: Opportunities and
Limitations 169
11 Publishing and Citing Research Data 197
Conclusion 214
Glossary of Abbreviations 216
Index 218
About the Authors
Louise Corti is an Associate Director of the UK Data Archive and Head of the
Producer Relations and Collections Development teams. She also leads quali
tative data activities at the Archive and directed UK Qualidata from 1998. She
has 25 years of expertise in archiving, sharing and using social science data, and
has particular expertise in the challenges of managing and sharing research
data, standards for archiving qualitative data, and teaching with data. She has
held numerous research awards in these areas focusing on best practice, train
ing and tools.
Veerle Van den Eynden manages the Research Data Management team at the
UK Data Archive. This team provides expertise, guidance and training on data
management and data sharing to researchers, to promote good data practices and
optimise data sharing. She leads research and development projects on various
research data management aspects. Veerle has many years of experience
researching interactions between people, plants and the environment, using a
combination of social and natural science methods, and has experienced first
hand the benefits that data sharing brings to research.
Libby Bishop is a Manager in the Research Data Management team at the UK
Data Archive where she provides guidance, support and training on data man
agement, data management planning and data sharing to researchers and data
producers. She has particular expertise in ethics and data sharing, including
informed consent for archiving data and the ethics of re-using data. She also
develops and delivers training for the User Support and Training section of the
UK Data Service, with a focus on secondary analysis of qulitative data.
Matthew Woollard is Director of the UK Data Archive. He has practical and
theoretical experience in all aspects of data service infrastructure, providing lead
ership in data curation, archiving and preservation activities. From 2002-2006
he was the Head of the History Data Service and from 2006-2010 an Associate
Director and Head of Digital Preservation and Systems at the UK Data Archive.
He currently provides leadership and strategic direction of the both the UK Data
Archive and the ESRC-funded UK Data Service.
Acknowledgements
The authors would first like to thank the UK's Economic and Social Research
Council (ESRC) for providing core funding for the UK Data Service, and its
previous incarnations, since 1967. This continuity of support has enabled us to
build one of the world's best national archiving centres and infrastructures, and
to attract highly skilled researchers and support staff. This funding has also ena
bled us to play a leading role in developing proactive and capacity-building
approaches to help researchers achieve better data curation practices. We would
equally like to thank Jisc for providing more recent funding opportunities on
research data management, from which we have benefited. These have inspired
awareness and implementation of better local research data management prac
tices, policies and infrastructure.
We thank Camille Corti-Georgiou, Sue Wood and Anne Etheridge for their
help in proofing and preparing this manuscript. Other UK Data Archive staff
members have contributed to specific exercises which have been adapted for
this book: Laurence Horton, Bethany Morgan-Brett and Mus Ahmet.
We express our warmest thanks to the late Dr Alasdair Crockett, who co
authored the very first UK Data Archive Managing and Sharing Data guide back
in 2006.
Finally thank you to all of our partners, Rob Lenart, Nick Snow, Allen Radtke
and Penny Woollard, for bearing with our often highly antisocial hours spent
writing, revising and editing.
Preface
Researchers' responsibilities towards their research data are set to change across
all domains of scientific endeavour. Research funders are increasingly mandating
open access to research data; governments internationally are demanding trans
parency in research; the economic climate is requiring much greater reuse of data;
and fear of data loss calls for more robust information security practices. All these
factors mean that researchers will need to improve, enhance and professionalize
their research data management skills to meet the challenge of producing the
highest quality shareable and reusable research outputs in a responsible and effi
cient way. The promotion of these skills offers a strategic contribution to the UK's
and other countries' research capacity building programmes.
Robust research data management techniques give researchers and data pro
fessionals the skills required to deal with the rapid, and uneven, developments
in the data management environment. Research funders in the UK, USA and
across Europe are gradually implementing data management (and sharing) poli
cies in order to maximize openness of data, transparency and accountability of
research they support. Journal publishers increasingly require submission of the
data upon which publications are based for peer review. Research funders and
data users recognize the long-term value of well-prepared data. And institutions
need good quality research information infrastructure to manage the ethical and
security risks of their data assets.
This book contains up-to-date, easy-to-digest information about managing and
sharing research data. It covers the full range of research data skills that feature
in the research data lifecycle and includes case studies and practical activities
that can assist in understanding the core concepts and applications. The detailed
guidance presented results from many years of delivering best practice advice,
guidance and training to a wide range of researchers. This has involved in-depth
discussion around topical research projects and from working closely with
researchers on developing solutions and tools that fit easily within standard
research practices. We define research data as any research materials resulting
from primary data collection or generation, qualitative or quantitative, or
derived from existing sources intended to be analysed in the course of a research
project. The scope covers numerical data, textual data, digitized materials,
images, recordings or modelling scripts. ~
The book is aimed at researchers at all levels, from the novice Masters or PhD
student to the experienced professor or research group leader. For the junior
..
PREFACE ix
researcher the material helps to provide building blocks for a solid research
knowledge base. For the experienced researcher the information can help plug
some of the gaps in current knowledge; can refresh and update their knowledge
in areas of rapid technological change, and can help them stay abreast of changes
in legislation relating to the governance of research data or the ethics of research.
It can also help reassure professional researchers that their practices are consist
ent with best practices. While this book has been written with the social science
researcher in mind, many of the issues and practical strategies we set out are
applicable more widely to any researcher creating or working with data.
The book is also aimed at the rapidly increasing group of research support
staff, both in academic institutions and government departments, tasked with
aspects of managing and sharing data, be it research grant and research govern
ance, ethical review, IT services staff or library support.
While the information set out in this book is not necessarily sequential, we
recommend reading the chapters in the order presented, so that knowledge is
built incrementally. The exercises are intended to help consolidate knowledge
and completing them will enable the reader to feel more confident in putting
the skills into practice in a real-life research environment.
The following chapters of this handbook describe the key elements of data
management that are essential in enabling the safe handling and sharing of
research data. Chapter 1 introduces the key drivers for the practices of sharing
data that have emerged, and considers arguments for why it is beneficial to share
data and, more practically, how data can be shared. In Chapter 2 we introduce
the concept of the research data lifecycle and how this extends the typical cycle
of research. Chapter 3 deals with research data management planning, using a
data management checklist, assigning roles and responsibilities, costing data
management into a research project, and the resourcing of data management
during research.
Chapter 4 describes documenting and providing context and provenance
information for quantitative and qualitative data, examining in detail how to
describe studies and files in a collection of research data. In Chapter 5 we cover
formatting and organizing data, including file formats, data conversion, the
organizing of files and folders, data quality assurance, version control and
authenticity, data transcription and data digitization. Chapter 6 discusses the
storing and transferring of data, setting out best practices in backing up data,
providing information security, data transmission and encryption, data disposal,
working with file-sharing and collaborative environments, and the long-term
storage and preservation of data.
In Chapter 7 we discuss research ethics and privacy, dealing with legal and
ethical issues that are relevant for data sharing, and consider pathways for pro
viding access to data. These include informed consent, statistical disclosure con
trol, anonymization of data and use of access control measures. Chapter 8
introduces Intellectual Property Rights in data, including copyright, database and
other rights and the (re)use of existing data sources in compliance with such rights.
Chapter 9 recommends data strategies for collaborative research, developing
x PREFACE
standard operating protocols, procedures and shared resources, coordinating data
records and assigning data roles and responsibilities in a team. Chapter l O covers
how to make use of other people's research data, looking at opportunities for
and challenges of reusing data, and provides six real-life case studies of data
reuse. Finally, in Chapter 11 we examine publishing and citing data, including
where to publish data and how to create citable data through the use of persis
tent digital identifiers.
Finally, the book has an accompanying website where additional exercises can
be found plus links from the UK Data Service. The URL is: http://ukdataservice.
ac. uk/manage-data/handbook
ONE
The Importance of Managing
and Sharing Research Data
Research data are the cornerstone of scientific knowledge, learning and innovation
and of our quest to understand, explain and develop humanity and the world around
us. In the digital age, the generation of research data has not only grown exponentially,
but data are nowadays very easily stored, kept and exchanged around the world.T he
demand for ensuring that the benefits of technological advances are employed to
modernize how we treat and utilize research data is growing by the day.
Over 50 years ago, Watson and Crick (1953) published the structure of DNA
in a single page article in Nature, with no raw data to underpin their findings.
Recently, The 1000 Genomes Project Consortium (2010) accompanied their
publication in Nature with 4.9 terabases of DNA sequences available through
the project website and deposited in dbSNP, the database of single nucleotide
polymorphisms (Kiermer, 2011). Genetics research is just one example that
shows how the openness and exchange of information, including research data,
can drive up rapidly the speed of research and discovery to our advantage. Just
consider the medical benefits of our growing genetic understanding.
The period from 2000 has seen a boom in both the drivers of data sharing, as
well as the development of human and material capability to do so. Research
funders are increasingly mandating easy and/or open access to research data, and
data plans to ensure maximum quality, sustainability, accessibility and openness
of research data. Publishers of academic findings demand that the supporting
data can be accessed for scrutiny or further exploration. Governments interna
tionally are demanding transparency in research and the economic climate
makes it desirable for much greater reuse of data to maximize the return on
science investments. Many researchers themselves agree that lack of access to
data impedes scientific progress.
Access to data means that scientific findings can be verified and scrutinized
if needed. Society demands access to data: to enable businesses to employ new
knowledge for the development of tools and applications; to allow organiza
tions to question governmental policies and decisions; and for thousands of
citizens to engage in research processes, or 'citizen science', to advance our col
lective scientific knowledge.
2 MANAGING AND SHARING RESEARCH DATA
Researchers' responsibilities towards their research data are therefore chang
ing across all domains of scientific endeavour. Researchers need to improve,
enhance and professionalize their research data management skills to meet the
challenge of producing the highest quality research outputs and sustainable data
in a responsible and efficient way, with the ability to share and reuse such out
puts. By data management we mean all data practices, manipulations, enhance
ments and processes that ensure that research data are of a high quality, are well
organized, documented, preserved, sustainable, accessible and reusable. The
promotion of data skills offers a strategic contribution to the UK's and other
countries' research capacity building programmes. And institutions need high
quality research data management to address the ethical and security risks of
their data assets. Robust research data management techniques give researchers,
data professionals and those involved in supporting research the skills that are
required to deal with the rapid, and uneven, developments in the data manage
ment environment.
The Data Sharing Agenda
Researchers have always understood the importance of sharing: sharing findings
in scientific publications; sharing expertise through peer networking; and col
laboration via learned societies. Technological advances allow this sharing to be
accelerated to a new level and applied in different ways: through open access to
research publications and also to research data, tools, software and educational
resources. The early 1990s saw a call for the opening up of published research
articles to be available online, later coming to include a greater range of primary
research materials.
Key drivers in the acceleration towards the opening up of research data have
been the OECD Principles and Guidelines for Access to Research Data from Public
Funding and the Berlin Declaration on Open Access to Knowledge in the Sciences
and Humanities. The Organization for Economic Cooperation and Development
(OECD) principles declared that publicly funded research data are a public good,
produced in the public interest, and that it should be made openly available with
as few restrictions as possible in a timely and responsible manner without harming
intellectual property (OECD, 2007). The Berlin Declaration called for promoting
knowledge dissemination through the open access paradigm via the internet, which
requires the worldwide web to be sustainable, interactive and transparent, with
openly accessible and compatible content and tools (Berlin Declaration, 2003).
At a European level the report of the High Level Expert Group on Scientific
Data, noting the rising tide of data, proposed that we are on the verge of a great
new leap in scientific capability, fuelled by data, with a need for a scientific
e-infrastructure that supports seamless access, use, reuse and trust of data
(European Commission, 2010). The report sketches the benefits~ and costs of
accelerating the development of a fully functional e-infrastructure for scientific
data. Open infrastructure, open culture and open content need to go hand in hand.