Table Of ContentLearning Materials in Biosciences
Michael H. Herzog
Gregory Francis
Aaron Clarke
Understanding
Statistics and
Experimental
Design
How to Not Lie with Statistics
Learning Materials in Biosciences
Learning Materials in Biosciences textbookscompactly and concisely discuss a specific
biological,biomedical,biochemical,bioengineeringorcellbiologictopic.Thetextbooks
inthisseriesarebasedonlecturesforupper-levelundergraduates,master’sandgraduate
students,presentedandwrittenbyauthoritativefiguresinthefieldatleadinguniversities
aroundtheglobe.
Thetitlesareorganizedto guidethereaderto adeeperunderstandingoftheconcepts
covered.
Eachtextbookprovidesreaderswithfundamentalinsightsintothesubjectandprepares
themtoindependentlypursuefurtherthinkingandresearchonthetopic.Coloredfigures,
step-by-stepprotocolsand take-homemessages offeran accessible approachto learning
andunderstanding.
In addition to being designed to benefit students, Learning Materials textbooks
represent a valuable tool for lecturers and teachers, helping them to prepare their own
respectivecoursework.
Moreinformationaboutthisseriesathttp://www.springer.com/series/15430
Michael H. Herzog • Gregory Francis •
Aaron Clarke
Understanding Statistics
and Experimental Design
How to Not Lie with Statistics
123
MichaelH.Herzog GregoryFrancis
BrainMindInstitute Dept.PsychologicalSciences
ÉcolePolytechniqueFédéraledeLausanne PurdueUniversity
(EPFL) WestLafayette
Lausanne,Switzerland IN,USA
AaronClarke
PsychologyDepartment
BilkentUniversity
Ankara,Turkey
ISSN2509-6125 ISSN2509-6133 (electronic)
LearningMaterialsinBiosciences
ISBN978-3-030-03498-6 ISBN978-3-030-03499-3 (eBook)
https://doi.org/10.1007/978-3-030-03499-3
Thisbookisanopenaccesspublication.
©TheEditor(s)(ifapplicable)andTheAuthor(s)2019
OpenAccessThisbookislicensedunderthetermsoftheCreativeCommonsAttribution-NonCommercial4.0
InternationalLicense(http://creativecommons.org/licenses/by-nc/4.0/), whichpermitsanynoncommercialuse,
sharing,adaptation,distributionandreproductioninanymediumorformat,aslongasyougiveappropriatecredit
totheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicenceandindicateifchanges
weremade.
Theimagesorotherthirdpartymaterialinthisbookareincludedinthebook’sCreativeCommonslicence,unless
indicatedotherwiseinacreditlinetothematerial.Ifmaterialisnotincludedinthebook’sCreativeCommons
licenceandyourintendeduseisnotpermittedbystatutoryregulationorexceedsthepermitteduse,youwillneed
toobtainpermissiondirectlyfromthecopyrightholder.
Thisworkissubject tocopyright. Allcommercial rightsarereservedbytheauthor(s), whetherthewholeor
partofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,
broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformationstorage
andretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynowknownor
hereafterdeveloped.Regardingthesecommercialrightsanon-exclusivelicensehasbeengrantedtothepublisher.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublicationdoes
notimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotective
lawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbookare
believedtobetrueandaccurateatthedateofpublication. Neitherthepublishernortheauthorsortheeditors
giveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforanyerrorsoromissions
thatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsinpublishedmaps
andinstitutionalaffiliations.
ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG.
Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland
Preface
Science,Society,andStatistics
Themodernworldisinundatedwithstatistics. Statistics determinewhatweeat,howwe
exercise,whowebefriend,howweeducateourchildren,andwhattypeofmedicaltreat-
mentwe use.Obviously,statistics isubiquitousand—unfortunately—misunderstandings
aboutstatisticsaretoo.InChap.1,wewillreportthatjudgesatlawcourtscometowrong
conclusions—conclusionsabout whether or not to send people to prison—because they
lackbasicstatisticalunderstanding.Wewillshowthatpatientscommittedsuicidebecause
doctorsdid not know how to interpretthe outcome of medical tests. Scientists are often
no better. We know colleagues who blindly trust the output of their statistical computer
programs,evenwhenthe resultsmakeno sense. We knowof publishedscientific papers
containingresultsthatareincompatiblewiththetheoreticalconclusionsoftheauthors.
Thereisanoldsaying(sometimesattributedtoMarkTwain,butapparentlyolder)that
“There are lies, damn lies, and statistics.” We have to concede that the saying makes a
validpoint.Peopledooftenmisusestatisticalanalyses.Maybeitdoesnotgoalltheway
to lying (making an intentionally false statement), but statistics often seem to confuse
ratherthanclarifyissues. Thetitle ofthis bookreflectsoureffortsto improvethe use of
statistics so that people who perform analyses will better interprettheir data and people
whoreadstatisticswillbetterunderstandthem.Understandingthecoreideasofstatistics
helps immediately to reveal that many scientific results are essentially meaningless and
explainswhymanyempiricalsciencesarecurrentlyfacingareplicationcrisis.
The confusion about statistics is especially frustrating because the core ideas are
actually pretty simple. Computing statistics can be very complicated (thus the need for
complicatedalgorithmsandthicktextbookswithdeeptheorems),butagoodunderstand-
ingofthebasicprinciplesofstatisticscanbemasteredbyeveryone.
In2013,withtheseideasinmind,westartedteachingacourseattheEcolePolytech-
niqueFédérale de Lausannein Switzerlandonunderstandingstatistics andexperimental
design. Over the years, the course became rather popular and draws in students from
biology, neuroscience, medicine, genetics, psychology, and bioengineering. Typically,
thesestudentshavealreadyhadoneormorestatisticsclassthatguidedthemthroughthe
v
vi Preface
detailsofstatisticalanalysis.Incontrast,ourcourseandthisbookaredesignedtofleshout
thebasicprinciplesofthoseanalyses,tosuccinctlyexplainwhattheydo,andtopromote
abetterunderstandingoftheircapabilitiesandlimitations.
AboutThisBook
BackgroundandGoal As mentioned,misunderstandingsaboutstatistics have becomea
major problem in our societies. One problemis that computingstatistics has become so
simplethatgoodeducationseemsnottobenecessary.Theoppositeistrue,however.Easy
to use statistical programs allow people to perform analyses without knowing what the
programsdoandwithoutknowinghowtointerprettheresults.Untenableconclusionsare
a common result. The reader will likely be surprised how big the problem is and how
useless a large numberof studies are. In addition,the reader may be surprised thateven
basicterms,suchasthep-value,arelikelydifferentfromwhatiscommonlybelieved.
The main goal of this book is to provide a short and to-the-point exposition on the
essentialsofstatistics.Understandingtheseessentialswillpreparethereadertounderstand
and critically evaluate scientific publications in many fields of science. We are not
interestedinteachingthereaderhowtocomputestatistics.Thiscanbelefttothecomputer.
Readership This book is for all citizens and scientists who want to understand the
principlesofstatisticsandinterpretstatisticswithoutgoingintothedetailedmathematical
computations.Itisperhapssurprisingthatthisgoalcanbeachievedwithaveryshortbook
andonlya fewequations.We thinkpeople(notjust studentsorscientists) either withor
withoutpreviousstatisticsclasseswillbenefitfromthebook.
We kept mathematics to the lowest level possible and provided non-mathematical
intuition wherever possible. We added equations only at occasions where they improve
understanding.Exceptforextremelybasicmath,onlysomebasicnotionsfromprobability
theoryareneededforunderstandingthemainideas.Mostofthenotionsbecomeintuitively
clearwhenreadingthetext.
WhatThis Book Is Not About This bookis not a course in mathematicalstatistics (e.g.,
Borelalgebras);itisnotatraditionaltextbookonstatisticsthatcoversthemanydifferent
tests and methods;and it is not a manual of statistical analysis programs,such as SPSS
and R. The bookis nota compendiumexplainingas manytests as possible. We tried to
providejustenoughinformationtounderstandthefundamentalsofstatisticsbutnotmuch
more.
Preface vii
WhatThisBookIsAbout InPartI,weoutlinethephilosophyofstatisticsusingaminimum
ofmathematicstomakekeyconceptsreadilyunderstandable.Wewillintroducethemost
basict-testandshowhowconfusionsaboutbasicprobabilitycanbeavoided.
UnderstandingPartIhelpsthereaderavoidthemostcommonpitfallsofstatisticsand
understand what the most common statistical tests actually compute. We will introduce
null hypothesis testing without the complicated traditional approach and use, instead,
a simpler approach via Signal Detection Theory (SDT). Part II is more traditional and
introducesthe classic tests ANOVA and correlations.Parts I and II providethe standard
statisticsastheyarecommonlyused.PartIIIshowsthatwehaveasciencecrisisbecause
simple concepts of statistics were misunderstood, such as the notion of replication.
For example, the reader may be surprised that too many replications of an experiment
can be suspicious rather than a reflection of solid science. Just the basic notions and
conceptsofChap.3inPartIareneededtounderstandPartIII,whichincludesideasthat
are not presented in other basic textbooks. Even though the main bulk of our book is
aboutstatistics, we show how statistics is stronglyrelated to experimentaldesign.Many
statisticalproblemscanbeavoidedbyclever,whichoftenmeanssimple,designs.
We believe that the unique mixture of core conceptsof statistics (Part I), a short and
distinct presentation of the most common statistical tests (Part II), and a new meta-
statistical approach (Part III) will not only provide a solid statistical understanding of
statisticsbutalsoexcitingandshockinginsightstowhatdeterminesourdailylives.
Materials For teachers, power point presentations covering the content of the book are
availableonrequestviae-mail:michael.herzog@epfl.ch.
Acknowledgements WewouldliketothankKonradNeumannandMarcRepnowforproofreading
the manuscript and Eddie Christopher, Aline Cretenoud, Max, Gertrud and Heike Herzog, Maya
AnnaJastrzebowska,SlimKammoun,IlariaRicchi,EvelinaThunell,RichardWalker,HeXu,and
Pierre Devaud for useful comments. We sadly report that Aaron Clarke passed away during the
preparationofthisbook.
Lausanne,Switzerland MichaelH.Herzog
WestLafayette,IN,USA GregoryFrancis
Ankara,Turkey AaronClarke
Contents
PartI TheEssentialsofStatistics
1 BasicProbabilityTheory ......................................................... 3
1.1 ConfusionsAboutBasicProbabilities:ConditionalProbabilities ........ 4
1.1.1 TheBasicScenario................................................ 4
1.1.2 ASecondTest...................................................... 7
1.1.3 OneMoreExample:Guillain-BarréSyndrome.................. 8
1.2 ConfusionsAboutBasicProbabilities:TheOddsRatio................... 9
1.2.1 BasicsAboutOddsRatios(OR).................................. 9
1.2.2 PartialInformationandtheWorldofDisease.................... 10
References........................................................................... 11
2 ExperimentalDesignandtheBasicsofStatistics:SignalDetection
Theory(SDT) ...................................................................... 13
2.1 TheClassicScenarioofSDT............................................... 13
2.2 SDTandthePercentageofCorrectResponses............................ 17
(cid:2)
2.3 TheEmpiricald ............................................................ 19
3 TheCoreConceptofStatistics................................................... 23
3.1 AnotherWaytoEstimatetheSignal-to-NoiseRatio ...................... 24
3.2 Undersampling .............................................................. 26
3.2.1 SamplingDistributionofaMean................................. 27
3.2.2 ComparingMeans................................................. 30
3.2.3 TheTypeIandIIError............................................ 33
3.2.4 TypeIError:Thep-ValueisRelatedtoaCriterion ............. 35
3.2.5 TypeIIError:Hits,Misses........................................ 36
3.3 Summary..................................................................... 38
3.4 AnExample ................................................................. 40
3.5 Implications,CommentsandParadoxes ................................... 41
Reference............................................................................ 50
ix
x Contents
4 Variationsonthet-Test ........................................................... 51
4.1 ABitofTerminology ....................................................... 52
4.2 TheStandardApproach:NullHypothesisTesting......................... 53
4.3 Othert-Tests................................................................. 53
4.3.1 One-Samplet-Test................................................. 53
4.3.2 DependentSamplest-Test ........................................ 54
4.3.3 One-TailedandTwo-TailedTests................................. 55
4.4 AssumptionsandViolationsofthet-Test.................................. 55
4.4.1 The Data Need to be Independentand Identically
Distributed ......................................................... 55
4.4.2 PopulationDistributionsareGaussianDistributed .............. 56
4.4.3 RatioScaleDependentVariable .................................. 56
4.4.4 EqualPopulationVariances....................................... 57
4.4.5 FixedSampleSize................................................. 57
4.5 TheNon-parametricApproach............................................. 58
4.6 TheEssentialsofStatisticalTests .......................................... 58
4.7 WhatComesNext?.......................................................... 59
PartII TheMultipleTestingProblem
5 TheMultipleTestingProblem ................................................... 63
5.1 IndependentTests ........................................................... 63
5.2 DependentTests............................................................. 65
5.3 HowManyScientificResultsAreWrong? ................................ 65
6 ANOVA ............................................................................. 67
6.1 One-WayIndependentMeasuresANOVA................................. 67
6.2 LogicoftheANOVA........................................................ 68
6.3 WhattheANOVADoesandDoesNotTellYou:Post-HocTests......... 71
6.4 Assumptions................................................................. 72
6.5 ExampleCalculationsfora One-WayIndependentMeasures
ANOVA...................................................................... 72
6.5.1 ComputationoftheANOVA...................................... 72
6.5.2 Post-HocTests..................................................... 74
6.6 EffectSize ................................................................... 76
6.7 Two-WayIndependentMeasuresANOVA................................. 77
6.8 RepeatedMeasuresANOVA................................................ 80
7 ExperimentalDesign:ModelFits,Power,andComplexDesigns ............ 83
7.1 ModelFits ................................................................... 83
7.2 PowerandSampleSize ..................................................... 86
7.2.1 OptimizingtheDesign ............................................ 86
7.2.2 ComputingPower ................................................. 87
7.3 PowerChallengesforComplexDesigns................................... 90