ebook img

Python for Probability, Statistics, and Machine Learning PDF

524 Pages·2022·8.262 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Python for Probability, Statistics, and Machine Learning

José Unpingco Python for Probability, Statistics, and Machine Learning Third Edition Python for Probability, Statistics, and Machine Learning José Unpingco Python for Probability, Statistics, and Machine Learning Third Edition JoséUnpingco SanDiego,CA,USA ISBN978-3-031-04647-6 ISBN978-3-031-04648-3 (eBook) https://doi.org/10.1007/978-3-031-04648-3 1stedition:©SpringerInternationalPublishingSwitzerland2016 2ndedition:©SpringerNatureSwitzerlandAG2019,correctedpublication2019 3rdedition:©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicensetoSpringerNature SwitzerlandAG2022 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuse ofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland ToIrene,Nicholas,andDaniella, foralltheir patientsupport. Preface to the Third Edition ThisthirdeditionisupdatedforPythonversion3.8+butdoesnotuseanynewsyntax andshouldbecompatiblewithPython3.6+also.Moreimportantly,manyexisting sectionshavebeenrevisedbasedonfeedbackfromthefirstandsecondversions.The book has been adopted into university-level curricula in data science and machine learning worldwide, including the University of California, San Diego. It has also been translated into multiple languages. With this in mind, I reedited significant portionsforclaritytohopefullyeasethetranslationburdenofthiseditionandmake iteasiertounderstandoverall.Almostallthefigureshavebeenupdatedforclarity. Thestatisticschapterhasdoubledinsizeandnowcoversimportantbuthard-to- find material, such as categorical data analysis and missing data imputation. The machinelearningchapterhasbeenupdated,andnewsectionscoveringgradienttree boosting have been added, along with a section on interpreting machine learning models. The introduction now includes a discussion of the Xarray module for multidimensionaldataframes.Overall,thebookisnowaboutone-thirdlargerthan thesecondedition. Asbefore,therearemoreProgrammingTipsthattheillustrateeffectivePython modulesandmethodsforscientificprogrammingandmachinelearning.Thereare over 650 run-able code blocks that have been tested for accuracy, so you can try these out for yourself in your own codes. This edition features over 200 graphical visualizationsgeneratedusingPythonthatillustratetheconceptsthataredeveloped both in code and in mathematics. We also discuss and use key Python modules, suchasNumpy,Scikit-learn,Sympy,Scipy,Lifelines,CvxPy,Theano,Matplotlib, Pandas,Tensorflow,Statsmodels,Xarray,Seaborn,andKeras. As with the first and second editions, all of the key concepts are developed mathematically and are reproducible in the given Python, to provide the reader multipleperspectivesonthematerial.Thisbookisnotdesignedtobeexhaustiveand reflectstheauthor’seclecticindustrialbackground.Thefocusremainsonconcepts and fundamentals for day-to-day work using Python in the most expressive way possible. You can reach the author with comments at github.com/unpingco byopeninganissueontheproject. vii viii PrefacetotheThirdEdition Acknowledgements I would like to acknowledge the Python community as a whole, for all their contributions that made this book possible. Hans Petter Lang- tangen was the author of the Doconce [22] document preparation system that was usedtowritethistext.ThankstoGeoffreyPoore[36]forhisworkwithPythonTeX and,bothkeytechnologieswereusedtoproducethisbook. SanDiego,CA,USA JoséUnpingco March,2022 Preface to the Second Edition ThissecondeditionisupdatedforPythonversion3.6+.Furthermore,manyexisting sections have been revised for clarity based on feedback from the first version. The book is now over 30 percent larger than the original with new material about important probability distributions, including key derivations and illustrative code samples. Additional important statistical tests are included in the statistics chapter includingtheFisherexacttestandtheMann-Whitney-WilcoxonTest.Anewsection onsurvivalanalysishasbeenincluded.Themostsignificantadditionisthesection ondeeplearningforimageprocessingwithadetaileddiscussionofgradientdescent methods that underpin all deep learning work. There is also substantial discussion regarding generalized linear models. As before, there are more Programming Tips thattheillustrateeffectivePythonmodulesandmethodsforscientificprogramming andmachinelearning.Thereare445run-ablecodeblocksthathavebeentestedfor accuracy,soyoucantrytheseoutforyourselfinyourowncodes.Over158graphical visualizations (almost all generated using Python) illustrate the concepts that are developed both in code and in mathematics. We also discuss and use key Python modules, such as Numpy, Scikit-learn, Sympy, Scipy, Lifelines, CvxPy, Theano, Matplotlib,Pandas,Tensorflow,Statsmodels,andKeras. As with the first edition, all of the key concepts are developed mathematically and are reproducible in Python, to provide the reader multiple perspectives on the material. There are multiple As before, this book is not designed to be exhaustive and reflects the author’s eclectic industrial background. The focus remains on conceptsandfundamentalsforday-to-dayworkusingPythoninthemostexpressive waypossible. ix Preface to the First Edition This book will teach you the fundamentals concepts that underpin probability and statisticsandillustratehowtheyrelatetomachinelearningviathePythonlanguage and its powerful extensions. This is not a good first book in any of these topics, becauseweassumethatyoualreadyhadadecentundergraduate-levelintroduction to probability and statistics. Furthermore, we also assume that you have a good grasp of the basic mechanics of the Python language itself. Having said that, this bookisappropriateifyouhavethisbasicbackgroundandwanttolearnhowtouse thescientificPythontoolchaintoinvestigatethesetopics.Ontheotherhand,ifyou are comfortable with Python, perhaps through working in another scientific field, thenthisbookwillteachyouthefundamentalsofprobabilityandstatisticsandhow to use these ideas to interpret machine learning methods. Likewise, if you are a practicingengineerusingacommercialpackage(e.g.,Matlab,IDL),thenyouwill learn how to effectively use the scientific Python toolchain by reviewing concepts youarealreadyfamiliarwith. The most important feature of this book is that everything in it is reproducible usingPython.Specifically,allofthecode,allofthefigures,and(mostof)thetext are available in the downloadable supplementary materials that correspond to this bookasIPythonNotebooks.IPythonNotebooksareliveinteractivedocumentsthat allow you to change parameters, recompute plots, and generally tinker with all of the ideas and code in this book. I urge you to download these IPython Notebooks and follow along with the text to experiment with the topics covered. I guarantee doing this will boost your understanding, because the IPython Notebooks allow for interactive widgets, animations, and other intuition-building features that help make many of these abstract ideas concrete. As an open-source project, the entire scientific Python toolchain, including the IPython Notebook, is freely available. Having taught this material for many years, I am convinced that the only way to learnistoexperimentasyougo.Thetextprovidesinstructionsonhowtogetstarted installingandconfiguringyourscientificPythonenvironment. xi xii PrefacetotheFirstEdition This book is not designed to be exhaustive and reflects the author’s eclectic background in industry. The focus is on fundamentals and intuitions for day-to- day work, especially when you must explain the results of your methods to a nontechnical audience. We have tried to use the Python language in the most expressivewaypossiblewhileencouraginggoodPythoncodingpractices.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.