ebook img

An Introduction to Statistics with Python: With Applications in the Life Sciences PDF

341 Pages·2022·6.934 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An Introduction to Statistics with Python: With Applications in the Life Sciences

Statistics and Computing Thomas Haslwanter An Introduction to Statistics with Python With Applications in the Life Sciences Second Edition Statistics and Computing SeriesEditor WolfgangKarlHärdle,Humboldt-UniversitätzuBerlin,Berlin,Germany StatisticsandComputing(SC)includesmonographsandadvancedtextsonstatistical computingandstatisticalpackages. Moreinformationaboutthisseriesathttps://link.springer.com/bookseries/3022 Thomas Haslwanter An Introduction to Statistics with Python With Applications in the Life Sciences Second Edition ThomasHaslwanter SchoolofMedicalEngineeringandApplied SocialSciences UniversityofAppliedSciencesUpperAustria Linz,Austria ISSN 1431-8784 ISSN 2197-1706 (electronic) StatisticsandComputing ISBN 978-3-030-97370-4 ISBN 978-3-030-97371-1 (eBook) https://doi.org/10.1007/978-3-030-97371-1 1stedition:©SpringerInternationalPublishingSwitzerland2016 2ndedition:©SpringerNatureSwitzerlandAG2022 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Tomytwo-,three-,andfour-legged householdcompanions:Jean,Felix,andhis sisterJessica:Thankyousomuchforallthe supportyouhaveprovidedovertheyears! Preface PrefacetotheFirstEdition In the data analysis for my own research work, I was often slowed down by two things:(1)Ididnotknowenoughstatistics,and(2)thebooksavailablewouldprovide atheoreticalbackground,butnorealpracticalhelp.Thebookyouareholdinginyour hands(oronyourtabletorlaptop)isintendedtobethebookthatwillsolvethisvery problem.Itisdesignedtoprovideenoughbasicunderstandingsothatyouknowwhat you are doing, and it should equip you with the tools you need. I believe that the Pythonsolutionsprovidedinthisbookforthemostbasicstatisticalproblemsaddress at least 90% of the problems that most physicists, biologists, and medical doctors encounterintheirwork.Soifyouarethetypicalgraduatestudentworkingonyour degree,oramedicalresearcheranalyzingyourlatestexperiments,chancesarethat youwillfindthetoolsyourequirehere—explanationandsource-codeincluded. This is the reason I have focused on statistical basics and hypothesis tests in this book, and refer only briefly to other statistical approaches. I am well aware thatmostofthetestspresentedinthisbookcanalsobecarriedoutusingstatistical modeling.Butinmanycases,thisisnotthemethodologyusedinmanylifescience journals.Advancedstatisticalanalysisgoesbeyondthescopeofthisbook,and—to befrank—exceedsmyownknowledgeofstatistics. MymotivationforprovidingthesolutionsinPythonisbasedontwoconsidera- tions.OneisthatIwouldlikethemtobeavailabletoeveryone.Whilecommercial solutions like Matlab, SPSS, Minitab etc. offer powerful tools, most can only use themlegallyinanacademicsetting.Incontrast,Pythoniscompletelyfree(asinfree beerisoftenheardinthePythoncommunity).ThesecondreasonisthatPythonisthe mostbeautifulcodinglanguagethatIhaveyetencountered;andaround2010Python and its documentation matured to the point where one can use it without being an seriouscoder.Together,thisbook,Python,andthetoolsthatthePythonecosystem offerstodayprovideabeautiful,freepackagethatcoversallthestatisticsthatmost researcherswillneedintheirlifetime. vii viii Preface PrefacetotheSecondEdition Sincethepublicationofthefirstedition,Pythonhascontinuouslygainedpopularity and become firmly established as one of the foremost programming languages for statistical data analysis. All the core packages have matured. And thanks to the stunningdevelopmentofJupyterasaninteractiveprogrammingenvironment,Python has become even more accessible for people with little programming background. Toreflectthesedevelopments,andtoincorporatethesuggestionsIhavereceivedfor improvingthepresentationofthematerial,Springerhasgivenmetheopportunityto bringoutaneweditionofIntroductiontoStatisticswithPython. Comparedtothefirstedition,thefollowingchangeshavebeenmade: (cid:129) The package pandas and its DataFrames have become an integral part of scientific Python, as has the Jupyter framework for interactive data environ- ments. Correspondingly, a bigger amount of space has been dedicated to their introduction. (cid:129) A new package, pingouin, is promising a simplified and more powerful inter- faceformanycommonstatisticsfunction.Thispackageisintroduced,andmany applicationexampleshavebeenadded. (cid:129) The visualization of data has been expanded, including the preparation of publication-readygraphics. (cid:129) Thedesignofexperimentsandpoweranalysesarediscussedinmoredetail. (cid:129) A new section has been added on the confidence intervals of frequently used statisticalparameters. (cid:129) Anewchapterhasbeenaddedonfindingpatternsindata,includinganintroduction to the correlation coefficient, cross- and autocorrelation. For an application of theseconcepts,ashortintroductionisgiventotimeseriesanalysis. Asforthefirstedition,allexamplesandsolutionsfromthisbookareagainavail- ableonline.Thisincludescodesamplesandexampleprograms,JupyterNotebooks withadditionalorextendedinformation,aswellasthedataandPythoncodeusedto generatemostofthefigures.Theycanbedownloadedfromhttps://github.com/tho mas-haslwanter/statsintro-python-2e. I hope this book will help you with the statistical analysis of your data, and conveysomeoftheoftenreallysimpleideasbehindthesometimesawkwardlynamed statisticalanalysisprocedures. ForWhomThisBookIs Thisbookassumesthat (cid:129) youhavesomebasicprogrammingexperience:Ifyouhavedonenoprogramming previously,youmaywanttostartoutwithPythonusingsomeofthegreatlinks provided in the text. Starting programming and starting statistics may be a bit Preface ix muchallatonce.However,solutionsprovidedtotheexercisesattheendofmost chaptersshouldhelpyoutogetuptospeedwithPython. (cid:129) you are not a statistics expert: If you have advanced statistics experience, the onlinehelpinPythonandthePythonpackagesmaybesufficienttoallowyouto domostofyourdataanalysisrightaway.Thisbookmaystillhelpyoutogetstarted withPython.However,thebookconcentratesonthebasicideasofstatisticsand onhypothesistests,andonlythelastpartintroduceslinearregressionmodeling andBayesianstatistics. This book is designed to give you all (or at least most of) the tools that you willneedforstatisticaldataanalysis.Iattempttoprovidethebackgroundyouneed to understand what you are doing. I do not prove any theorems, and do not apply mathematicsunlessnecessary.Foralltests,aworkingPythonprogramisprovided. Inprinciple,youjusthavetodefineyourproblem,selectthecorrespondingprogram, andadaptittoyourneeds.Thisshouldallowyoutogetgoingquickly,evenifyou have little Python experience. This is also the reason why I have not provided the software as one single Python package; I expect that you will have to tailor each programtoyourspecificsetup(dataformat,etc.). Thisbookisorganizedintothreeparts PartIgivesanintroductiontoPython:howtosetitup,simpleprogramstoget started,andtipsonhowtoavoidsomecommonmistakes.Italsoshowshowto readdatafromdifferentsourcesintoPython,andhowtovisualizestatisticaldata. PartIIprovidesanintroductiontostatisticalanalysis;onhowtodesignastudy, power analysis, and how best to analyze data; probability distributions; and an overviewofthemostimportanthypothesistests.Eventhoughmodernstatistics isfirmlybasedinstatisticalmodeling,hypothesistestsstillseemtodominatethe lifesciences.Foreachtest,aPythonprogramisprovidedthatshowshowthetest canbeimplemented. Part III provides an introduction to correlation and regression analysis, time seriesanalysis,andstatisticalmodeling,andalookatadvancedstatisticalanalysis procedures. I have also included tests on discrete data in this section, such as logistic regression, as they utilize “generalized linear models” which I regard as advanced. This part ends with a presentation of the basic ideas of Bayesian statistics. To achieve all those goals as quickly as possible, the Appendix A of the book provideshintsonhowtoefficientlydevelopcorrectandworkingcode.Thisshould getyoutothepointwhereyoucangetthingsdonequickly. x Preface Acknowledgments Pythonisbuiltonthecontributionsfromtheusercommunity,andsomeofthesections in this book are based on some of the excellent information available on the web. (Permissionhasbeengrantedbytheauthorstoreprinttheircontributionshere.) Iespeciallywanttothankthefollowingpeople: (cid:129) ChristianeTakacshelpedmeenormouslybypolishingtheintroductorystatistics sections. (cid:129) ConnorJohnsonwroteaveryniceblogexplainingtheresultsofthestatsmodels OLScommand,whichprovidedthebasisforthesectiononStatisticalModels. (cid:129) Cam Davidson Pilon wrote the excellent open-source e-book Probabilistic- Programming-and-Bayesian-Methods-for-Hackers. From there, I took the exampleoftheChallengerdisastertodemonstrateBayesianstatistics. (cid:129) FabianPedregosa’sblogonordinallogisticregressionallowedmetoincludethis topic,whichotherwisewouldbeadmittedlybeyondmyownskills. IalsowanttothankSpringerPublishingforthechancetobringoutthesecond editionofthisbook,andtobasethethreeintroductorychapters(Python,DataImport, and Data Display) to a significant part on the corresponding chapters of my book Hands-onSignalAnalysiswithPython. Ifyouhaveasuggestionorcorrection,pleasesendanemailtomyworkaddress [email protected],Iwilladd youtothelistofcontributorsunlessadvisedotherwise.Ifyouincludeatleastpart of the sentence the error appears in, that makes iteasy for me to search. Page and sectionnumbersarefine,too,butnotaseasytoworkwith.Thanks! Linz,Austria ThomasHaslwanter August2022

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.