Table Of ContentStatistics and Computing
Thomas Haslwanter
An Introduction
to Statistics
with Python
With Applications in the Life Sciences
Second Edition
Statistics and Computing
SeriesEditor
WolfgangKarlHärdle,Humboldt-UniversitätzuBerlin,Berlin,Germany
StatisticsandComputing(SC)includesmonographsandadvancedtextsonstatistical
computingandstatisticalpackages.
Moreinformationaboutthisseriesathttps://link.springer.com/bookseries/3022
Thomas Haslwanter
An Introduction to Statistics
with Python
With Applications in the Life Sciences
Second Edition
ThomasHaslwanter
SchoolofMedicalEngineeringandApplied
SocialSciences
UniversityofAppliedSciencesUpperAustria
Linz,Austria
ISSN 1431-8784 ISSN 2197-1706 (electronic)
StatisticsandComputing
ISBN 978-3-030-97370-4 ISBN 978-3-030-97371-1 (eBook)
https://doi.org/10.1007/978-3-030-97371-1
1stedition:©SpringerInternationalPublishingSwitzerland2016
2ndedition:©SpringerNatureSwitzerlandAG2022
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof
thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,
broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation
storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology
nowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication
doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant
protectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook
arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor
theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany
errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional
claimsinpublishedmapsandinstitutionalaffiliations.
ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG
Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland
Tomytwo-,three-,andfour-legged
householdcompanions:Jean,Felix,andhis
sisterJessica:Thankyousomuchforallthe
supportyouhaveprovidedovertheyears!
Preface
PrefacetotheFirstEdition
In the data analysis for my own research work, I was often slowed down by two
things:(1)Ididnotknowenoughstatistics,and(2)thebooksavailablewouldprovide
atheoreticalbackground,butnorealpracticalhelp.Thebookyouareholdinginyour
hands(oronyourtabletorlaptop)isintendedtobethebookthatwillsolvethisvery
problem.Itisdesignedtoprovideenoughbasicunderstandingsothatyouknowwhat
you are doing, and it should equip you with the tools you need. I believe that the
Pythonsolutionsprovidedinthisbookforthemostbasicstatisticalproblemsaddress
at least 90% of the problems that most physicists, biologists, and medical doctors
encounterintheirwork.Soifyouarethetypicalgraduatestudentworkingonyour
degree,oramedicalresearcheranalyzingyourlatestexperiments,chancesarethat
youwillfindthetoolsyourequirehere—explanationandsource-codeincluded.
This is the reason I have focused on statistical basics and hypothesis tests in
this book, and refer only briefly to other statistical approaches. I am well aware
thatmostofthetestspresentedinthisbookcanalsobecarriedoutusingstatistical
modeling.Butinmanycases,thisisnotthemethodologyusedinmanylifescience
journals.Advancedstatisticalanalysisgoesbeyondthescopeofthisbook,and—to
befrank—exceedsmyownknowledgeofstatistics.
MymotivationforprovidingthesolutionsinPythonisbasedontwoconsidera-
tions.OneisthatIwouldlikethemtobeavailabletoeveryone.Whilecommercial
solutions like Matlab, SPSS, Minitab etc. offer powerful tools, most can only use
themlegallyinanacademicsetting.Incontrast,Pythoniscompletelyfree(asinfree
beerisoftenheardinthePythoncommunity).ThesecondreasonisthatPythonisthe
mostbeautifulcodinglanguagethatIhaveyetencountered;andaround2010Python
and its documentation matured to the point where one can use it without being an
seriouscoder.Together,thisbook,Python,andthetoolsthatthePythonecosystem
offerstodayprovideabeautiful,freepackagethatcoversallthestatisticsthatmost
researcherswillneedintheirlifetime.
vii
viii Preface
PrefacetotheSecondEdition
Sincethepublicationofthefirstedition,Pythonhascontinuouslygainedpopularity
and become firmly established as one of the foremost programming languages for
statistical data analysis. All the core packages have matured. And thanks to the
stunningdevelopmentofJupyterasaninteractiveprogrammingenvironment,Python
has become even more accessible for people with little programming background.
Toreflectthesedevelopments,andtoincorporatethesuggestionsIhavereceivedfor
improvingthepresentationofthematerial,Springerhasgivenmetheopportunityto
bringoutaneweditionofIntroductiontoStatisticswithPython.
Comparedtothefirstedition,thefollowingchangeshavebeenmade:
(cid:129) The package pandas and its DataFrames have become an integral part of
scientific Python, as has the Jupyter framework for interactive data environ-
ments. Correspondingly, a bigger amount of space has been dedicated to their
introduction.
(cid:129) A new package, pingouin, is promising a simplified and more powerful inter-
faceformanycommonstatisticsfunction.Thispackageisintroduced,andmany
applicationexampleshavebeenadded.
(cid:129)
The visualization of data has been expanded, including the preparation of
publication-readygraphics.
(cid:129)
Thedesignofexperimentsandpoweranalysesarediscussedinmoredetail.
(cid:129)
A new section has been added on the confidence intervals of frequently used
statisticalparameters.
(cid:129)
Anewchapterhasbeenaddedonfindingpatternsindata,includinganintroduction
to the correlation coefficient, cross- and autocorrelation. For an application of
theseconcepts,ashortintroductionisgiventotimeseriesanalysis.
Asforthefirstedition,allexamplesandsolutionsfromthisbookareagainavail-
ableonline.Thisincludescodesamplesandexampleprograms,JupyterNotebooks
withadditionalorextendedinformation,aswellasthedataandPythoncodeusedto
generatemostofthefigures.Theycanbedownloadedfromhttps://github.com/tho
mas-haslwanter/statsintro-python-2e.
I hope this book will help you with the statistical analysis of your data, and
conveysomeoftheoftenreallysimpleideasbehindthesometimesawkwardlynamed
statisticalanalysisprocedures.
ForWhomThisBookIs
Thisbookassumesthat
(cid:129)
youhavesomebasicprogrammingexperience:Ifyouhavedonenoprogramming
previously,youmaywanttostartoutwithPythonusingsomeofthegreatlinks
provided in the text. Starting programming and starting statistics may be a bit
Preface ix
muchallatonce.However,solutionsprovidedtotheexercisesattheendofmost
chaptersshouldhelpyoutogetuptospeedwithPython.
(cid:129)
you are not a statistics expert: If you have advanced statistics experience, the
onlinehelpinPythonandthePythonpackagesmaybesufficienttoallowyouto
domostofyourdataanalysisrightaway.Thisbookmaystillhelpyoutogetstarted
withPython.However,thebookconcentratesonthebasicideasofstatisticsand
onhypothesistests,andonlythelastpartintroduceslinearregressionmodeling
andBayesianstatistics.
This book is designed to give you all (or at least most of) the tools that you
willneedforstatisticaldataanalysis.Iattempttoprovidethebackgroundyouneed
to understand what you are doing. I do not prove any theorems, and do not apply
mathematicsunlessnecessary.Foralltests,aworkingPythonprogramisprovided.
Inprinciple,youjusthavetodefineyourproblem,selectthecorrespondingprogram,
andadaptittoyourneeds.Thisshouldallowyoutogetgoingquickly,evenifyou
have little Python experience. This is also the reason why I have not provided the
software as one single Python package; I expect that you will have to tailor each
programtoyourspecificsetup(dataformat,etc.).
Thisbookisorganizedintothreeparts
PartIgivesanintroductiontoPython:howtosetitup,simpleprogramstoget
started,andtipsonhowtoavoidsomecommonmistakes.Italsoshowshowto
readdatafromdifferentsourcesintoPython,andhowtovisualizestatisticaldata.
PartIIprovidesanintroductiontostatisticalanalysis;onhowtodesignastudy,
power analysis, and how best to analyze data; probability distributions; and an
overviewofthemostimportanthypothesistests.Eventhoughmodernstatistics
isfirmlybasedinstatisticalmodeling,hypothesistestsstillseemtodominatethe
lifesciences.Foreachtest,aPythonprogramisprovidedthatshowshowthetest
canbeimplemented.
Part III provides an introduction to correlation and regression analysis, time
seriesanalysis,andstatisticalmodeling,andalookatadvancedstatisticalanalysis
procedures. I have also included tests on discrete data in this section, such as
logistic regression, as they utilize “generalized linear models” which I regard
as advanced. This part ends with a presentation of the basic ideas of Bayesian
statistics.
To achieve all those goals as quickly as possible, the Appendix A of the book
provideshintsonhowtoefficientlydevelopcorrectandworkingcode.Thisshould
getyoutothepointwhereyoucangetthingsdonequickly.
x Preface
Acknowledgments
Pythonisbuiltonthecontributionsfromtheusercommunity,andsomeofthesections
in this book are based on some of the excellent information available on the web.
(Permissionhasbeengrantedbytheauthorstoreprinttheircontributionshere.)
Iespeciallywanttothankthefollowingpeople:
(cid:129)
ChristianeTakacshelpedmeenormouslybypolishingtheintroductorystatistics
sections.
(cid:129)
ConnorJohnsonwroteaveryniceblogexplainingtheresultsofthestatsmodels
OLScommand,whichprovidedthebasisforthesectiononStatisticalModels.
(cid:129) Cam Davidson Pilon wrote the excellent open-source e-book Probabilistic-
Programming-and-Bayesian-Methods-for-Hackers. From there, I took the
exampleoftheChallengerdisastertodemonstrateBayesianstatistics.
(cid:129)
FabianPedregosa’sblogonordinallogisticregressionallowedmetoincludethis
topic,whichotherwisewouldbeadmittedlybeyondmyownskills.
IalsowanttothankSpringerPublishingforthechancetobringoutthesecond
editionofthisbook,andtobasethethreeintroductorychapters(Python,DataImport,
and Data Display) to a significant part on the corresponding chapters of my book
Hands-onSignalAnalysiswithPython.
Ifyouhaveasuggestionorcorrection,pleasesendanemailtomyworkaddress
thomas.haslwanter@fh-ooe.at.IfImakeachangebasedonyourfeedback,Iwilladd
youtothelistofcontributorsunlessadvisedotherwise.Ifyouincludeatleastpart
of the sentence the error appears in, that makes iteasy for me to search. Page and
sectionnumbersarefine,too,butnotaseasytoworkwith.Thanks!
Linz,Austria ThomasHaslwanter
August2022