ebook img

Data Literacy: How to Make Your Experiments Robust and Reproducible PDF

244 Pages·2017·17.259 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Literacy: How to Make Your Experiments Robust and Reproducible

DATA LITERACY HOW TO MAKE YOUR EXPERIMENTS ROBUST AND REPRODUCIBLE N R. S , MD, P D EIL MALHEISER H Associate Professor in Psychiatry, Department of Psychiatry and Psychiatric Institute University of Illinois School of Medicine, USA Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1800, San Diego, CA 92101-4495, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 2017 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-811306-6 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Mica Haley Acquisition Editor: Rafael E. Teixeira Editorial Project Manager: Mariana L. Kuhl Production Project Manager: Poulouse Joseph Designer: Alan Studholme Typeset by TNQ Books and Journals Cover image credit and illustrations placed between chapters: Stephanie Muscat What Is Data Literacy? Being literate meansdliterally!dbeing dataliteracydonespeaksof1birdor2birds, able to read and write, but it also implies but never 1.3 birds! having a certain level of curiosity and The goal of this book is to learn how a acquiring enough background to notice, scientist looks at datadhow a feeling for appreciate, and enjoy the finer points of a data permeates every aspect of a scientific piece of writing. A person who has money investigation, touching on aspects of experi- literacy may not have taken courses in ac- mental design, data analysis, statistics, and counting or business, but is likely to know data management. After acquiring scientific how much they have in the bank, to know data literacy, you will not be able to hear whose face is on the 10-dollar bill, and to about an experiment without automatically know roughly how much they spend on the asking yourself a series of questions such as: electric bill each month. Many famous mu- “Is the sampling adequate in size, balanced, sicians have no formal training and cannot and unbiased? What are the positive and read sheet music (Jimi Hendrix and Eric negative controls? Are the data properly Clapton, to name two), yet they do possess cleansed and normalized?” music literacydable to recognize, produce, Dataliteracymakesadifferenceindailylife and manipulate melodies, harmonies, too:Whenalaypersongoestothedoctorfora rhythms, and chord shifts. And data liter- checkup,thenursetellshimorhertotakeoff acy? Almost everyone has some degree of theirshoesandtheysteponthescale(Fig.1). FIGURE 1 Anurseweighsapatientwhoseemsworriedemaybeheisthinkingabouttheneedforcalibration andlinearityofthemeasurement? ix x WHATISDATALITERACY? When a scientist goes to the doctor’s office, biomedical sciences, social sciences, infor- before they step on the scale, they tare the mation science, and computer science. scale to make sure it reads zero when no Even though all graduate students have weight is applied. Then, they find a known the opportunity to take courses on experi- calibrated weight and put it on the scale, to mentaldesignandstatistics,Ihavefoundthat makesurethatitreadsaccurately(towithina the amount of material presented there is few ounces). They may even take a series of overwhelmingly comprehensive. Equally weights that cover the range of their own important, the authors of textbooks on those weight (say, 100, 150, and 200pounds) to topics come from a different world than the makesurethatthereadingsarelinearwithin typical student contemplating a career at the therangeofitseffectiveoperation.Theytake laboratorybench.(Hint:Thereisahiddenbut the weight of their clothes (and contents of yawningdigitaldividebetweentheworldof their pockets) into account, perhaps by esti- those who can program computer code, and mation, perhaps by disrobing. Finally, they thosewhocannot.)Asaresult,studentstend steponthescale.Andthentheydothatthree tolearnexperimentaldesignandstatisticsby times and take the average of the three roteyetdonotachieveabasic,intuitivesense measurements! of data literacy that they can apply to their This book is based upon a course that I everyday scientificlife. have given to graduate students in neuro- Hencethisbookisnotintendedtoreplace science at the University of Illinois Medical traditional courses, texts, and online re- School. Because most of the students are sources, but rather should be read as a pre- involved in laboratory animal studies, test- quel or supplement to them. I will try to tube molecular biological studies, human illustrate points with examples and anec- psychological or neuroimaging studies, or dotes, sometimes from my own personal clinical trials, I have chosen examples liber- experiencesdand will offer more personal ally from this sphere. Some of the examples opinions, advice, and tips than you may be do unavoidably have jargon, and a basic fa- used to seeing in a textbook! On the other miliarity with science is assumed. However, hand, I will not include problem sets and the book should be readable and relevant to will cite only the minimum number of ref- students and working scientists of any erences to scholarly works. discipline, including physical sciences, Teachingisharderthanitlooks. Acknowledgments Thanks to John Larson for originally and corrections on selected chapters. Vetle inviting me to teach a course on Data Torvik and Giovanni Lugli have been Literacy for students in the Graduate particularly longstanding research collabo- PrograminNeuroscienceattheUniversityof rators of mine, and my adventures in Illinois Medical School in Chicago. I owe a experimental design and data analysis have particulardebtofgratitudetothestudentsin ofteninvolvedoneorbothofthem.Finally,I the class, whose questions and feedback thank my illustrator, Stephanie Muscat, who have shaped the course content over several hasaparticulartalentforcapturingscientific years. My colleagues Aaron Cohen and processes in visual termsdsimply and with Maryann Martone gave helpful comments humor. xiii Why This Book? Thescientificliteratureisincreasing expo- Dawley rats). The National Institutes of nentially. Each day, about 2000 new articles Healthandleadingjournalsandpharmaceu- are added to MEDLINE, a free and public tical companies have acknowledged the curated database of peer-reviewed biomed- problem and its magnitude and are taking ical articles (http://www.pubmed.gov). steps to improve the way that experiments And yet, the scientific community is are designed and reported [4e6]. currently faced with not one, but two major What has brought us to this state of af- crises that threaten our continued progress. fairs? Certainly, a lack of data literacy is a First, a huge amount of waste occurs at contributing factor, and a major goal of this every step in the scientific pipeline [1]: book is to cover issues that contribute to Most experiments that are carried out are waste and that limit reproducibility and preliminary (“pilot studies”), descriptive, robustness. However, we also need to face small scale, incomplete, lack some controls the fact that the culture of science actively for interpretation, have unclear significance, encourages scientists to engage in a number orsimply donotgiveclear results. Ofexper- of engrained practices thatdif we are being iments that do give clear results, most are charitabledwould describe as outdated. never published, and the majority of those The current system rewards scientists for published are never cited (and may not publishing findings that lead to funding, ci- ever be read!). The original raw data ac- tations, promotions, and awards. Unfortu- quired by the experimenter sits in a drawer nately, none of these goals are under the or on a hard drive, eventually to be lost. direct control of the investigators them- Rarely are the data preserved in a form selves! Achieving high impact or winning that allows others to view them, much less anawardislikeachievingcelebrityinHolly- reuse them in additional research. wood: capricious and unpredictable. One Second, a significant minority of pub- wouldliketobelievethatreaders,reviewers, lished findings cannot be replicated by inde- and funders will recognize and support pendent investigators. This is both a crisis of work that is of high intrinsic quality, but ev- reproducibility (failing to find the same re- idencesuggeststhatthereisahighdegreeof sults even when trying to duplicate the randomness in manuscript and grant pro- experimental variables exactly) [2,3] and posal scores [7,8], which can lead to super- robustness (failing to find similar results stitious behavior [9] and outright cheating. when seemingly incidental variables are In contrast, it is within the power of each allowed to vary, e.g., when an experiment scientist to make their data solid, reliable, originally reported on 6-month-old Wistar extensive, and definitive in terms of find- rats is repeated on 8-month-old Sprague ings. The interpretation of the data may be xv xvi WHYTHISBOOK? tentative and may not be “true” in some ab- investigator might focus only on truly old stract or lasting sense, but at least others can fathers, e.g., aged 50 or even 60 years. build on the data in the future. Furthermore, investigators might correlate Infact,philosophically,therearesomead- ages with overall prevalence of psychiatric vantages to recentering the scientific enter- illnesses, or any disease having psychotic prise around the desire to publish findings features,oronlythosewithastablediagnosis thatare,firstandforemost,robustandrepro- of schizophrenia by the age of 30 years, etc. ducible. As we will see, placing a high value Without knowing the nature of the effect onrobustnessandreproducibilityempowers in advance, one could defend any of these scientists and is part of a larger emerging ways of analyzing the data. movementthatincludesopenaccessforpub- So, the same data can be sliced and diced lishing and open sharing of data. in any number of ways, and the resulting Traditionally, a scientific paper is ex- publication can look very different depend- pected to present a coherent narrative with ing on how the authors choose to proceed. a strong interpretation and a clear Even if one accepts that there is some rela- conclusiondthat is, it tells a good story tionship between paternal age and andithasagoodpunchline!Theunderlying schizophreniadand this finding has been data are often presented in a highly com- replicated many times in the past 15yearsd pressed, summarized form, or not presented it is not at all obvious what this finding at all. Recently, however, there has been a “means” in terms of underlying mecha- move toward considering the raw data nisms. One can imagine that older fathers themselves to be the primary outcome of a might bring up their children differently scientific study, to be carefully described (e.g.,perhapsexposingtheiryoungoffspring and preserved, while the authors’ own ana- to old-fashioned discipline practices). Alter- lyses and interpretation are considered sec- natively, older fathers may have acquired a ondary or even dispensible. growing number of point mutations in their We can see why this may be a good idea: sperm DNA over time! Subsequent follow- For example, let us consider a study hypoth- up studies may attempt to characterize the esizing that the age of the father (at the time relationship of age to risk in more detail, of birth) correlates positively with the risk of and to test hypotheses regarding which their adult offspring developing schizo- possible mechanisms seem most likely. And phrenia [10]. Imagine that the raw data of course, the true mechanism(s) might consist of a table of human male subjects reflect genetic or environmental influences listing their ages and other attributes, that are not even appreciated or known at togetherwithalistoftheiroffspringandsub- the time that the relation of age to risk was sequent psychiatric histories. Different inves- first noticed. tigators might choose to analyze these raw To summarize, the emerging view is that data in different ways, which might affect the bedrock of a scientific paper is its data. or alter their conclusions: For example, one The authors’ presentation and analysis of mightcorrelatepaternalageswithriskacross the data, resulting in its primary finding, is the entire life cycle, while another might traditionally considered by most scientists divide the subjects into categorical groups, to be the outcome of the paper, and it e.g., “young” fathers (aged 14e21years), is this primary finding that ought to be “regular” fathers (aged 21e40years), and robust and reproducible. However, as we “old” fathers (aged 40 years). Another have seen, the primary finding is a bit þ xvii WHYTHISBOOK? more subjective and removed from the data [3] LeekJT,JagerLR.Ismostpublishedresearchreally themselves, and according to the emerging false? bioRXiv April 27, 2016. http://dx.doi.org/ 10.1101/050575. view, it is NOT the bedrock of the paper. [4] Landis SC,AmaraSG,Asadullah K,Austin CP, Rather, it is important that independent Blumenstein R, Bradley EW, Crystal RG, investigators should be able to view the Darnell RB, Ferrante RJ, Fillit H, Finkelstein R, raw data to reanalyze them, or compare or Fisher M, Gendelman HE, Golub RM, pool with other data obtained from other GoudreauJL,GrossRA,GubitzAK,HesterleeSE, Howells DW, Huguenard J, Kelner K, sources. Finally, the authors’ interpretation Koroshetz W, Krainc D, Lazic SE, Levine MS, of the finding, and their general conclusions, Macleod MR, McCall JM, Moxley 3rd RT, may be insightful and point the way for- Narasimhan K, Noble LJ, Perrin S, Porter JD, ward, but should be taken with a big grain StewardO, UngerE,Utz U, SilberbergSD. Acall of salt. fortransparentreportingtooptimizethepredictive The status quo of scientific practice is value of preclinical research. Nature October 11, 2012;490(7419):187e91. http://dx.doi.org/ changing, radically and rapidly, and it is 10.1038/nature11556. important to understand these trends to do [5] HodesRJ,InselTR,LandisSC.Onbehalfofthe science in the 21st century. This book will NIHblueprintforneuroscienceresearch.TheNIH provide a roadmap for students wishing toolbox:settingastandardforbiomedicalresearch. Neurology 2013;80(11 Suppl. 3):S1. http:// to navigate each step in the pipeline, from dx.doi.org/10.1212/WNL.0b013e3182872e90. hypothesis to publication, during this time [6] Begley CG, Ellis LM. Drug development: raise of transition. Do not worry, this roadmap standards for preclinical cancer research. Nature won’t turn you into a mere data collector. March 28, 2012;483(7391):531e3. http:// Finding novel, original, and dramatic find- dx.doi.org/10.1038/483531a. [7] ColeS,SimonGA.Chanceandconsensusinpeer ings, and achieving breakthroughs will review.ScienceNovember20,1981;214(4523):881e6. remain as important as ever. [8] SnellRR.Menageaquoi?Optimalnumberofpeer reviewers. PLoS One April 1, 2015;10(4):e0120838. http://dx.doi.org/10.1371/journal.pone.0120838. References [9] SkinnerBF.Superstitioninthepigeon.JExpPsychol April1948;38(2):168e72. [1] Chalmers I, Glasziou P.Avoidable waste in the [10] Brown AS, Schaefer CA, Wyatt RJ, Begg MD, production and reporting of research evidence. Goetz R, Bresnahan MA, Harkavy-Friedman J, Lancet July 4, 2009;374(9683):86e9. http:// GormanJM,MalaspinaD,SusserES.Paternalage dx.doi.org/10.1016/S0140-6736(09)60329-9. [2] IoannidisJP.Whymostpublishedresearchfind- and risk ofschizophreniain adultoffspring.AmJ PsychiatrySeptember2002;159(9):1528e33. ingsarefalse.PLoSMedAugust2005;2(8):e124. Howmanypotentialnewdiscoveriesarefiledawaysomewhere, unpublished,unfunded,andunknown?

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.