ebook img

Data Science with Julia PDF

241 Pages·2019·5.84 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Science with Julia

Data Science with Julia Data Science with Julia By Paul D. McNicholas and Peter A. Tait CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2019 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper Version Date: 20191119 International Standard Book Number-13: 978-1-138-49998-0 (Paperback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged, please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www. copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Names: McNicholas, Paul D., author. | Tait, Peter A., author. Title: Data science with Julia / Paul D. McNicholas, Peter A. Tait. Description: Boca Raton : Taylor & Francis, CRC Press, 2018. | Includes bibliographical references and index. Identifiers: LCCN 2018025237 | ISBN 9781138499980 (pbk.) Subjects: LCSH: Julia (Computer program language) | Data structures (Computer science) Classification: LCC QA76.73.J85 M37 2018 | DDC 005.7/3--dc23 LC record available at https://lccn.loc.gov/2018025237 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com For Oscar, who tries and tries. PDM To my son, Xavier, Gettin’ after it does pay off. PAT Contents Chapter 1(cid:4) Introduction 1 1.1 DATASCIENCE 1 1.2 BIGDATA 4 1.3 JULIA 5 1.4 JULIAANDRPACKAGES 6 1.5 DATASETS 6 1.5.1 Overview 6 1.5.2 Beer Data 6 1.5.3 Coffee Data 7 1.5.4 Leptograpsus Crabs Data 8 1.5.5 Food Preferences Data 9 1.5.6 x2 Data 9 1.5.7 Iris Data 11 1.6 OUTLINE OF THE CONTENTS OF THIS MONOGRAPH 11 Chapter 2(cid:4) Core Julia 13 2.1 VARIABLENAMES 13 2.2 OPERATORS 14 2.3 TYPES 15 2.3.1 Numeric 15 2.3.2 Floats 17 2.3.3 Strings 19 2.3.4 Tuples 22 2.4 DATASTRUCTURES 23 2.4.1 Arrays 23 vii viii (cid:4) Contents 2.4.2 Dictionaries 26 2.5 CONTROLFLOW 28 2.5.1 Compound Expressions 28 2.5.2 Conditional Evaluation 29 2.5.3 Loops 30 2.5.3.1 Basics 30 2.5.3.2 Loop termination 32 2.5.3.3 Exception handling 33 2.6 FUNCTIONS 36 Chapter 3(cid:4) Working with Data 43 3.1 DATAFRAMES 43 3.2 CATEGORICALDATA 47 3.3 INPUT/OUTPUT 48 3.4 USEFULDATAFRAMEFUNCTIONS 54 3.5 SPLIT-APPLY-COMBINESTRATEGY 56 3.6 QUERY.JL 59 Chapter 4(cid:4) Visualizing Data 67 4.1 GADFLY.JL 67 4.2 VISUALIZINGUNIVARIATEDATA 69 4.3 DISTRIBUTIONS 72 4.4 VISUALIZINGBIVARIATEDATA 83 4.5 ERRORBARS 90 4.6 FACETS 91 4.7 SAVINGPLOTS 91 Chapter 5(cid:4) Supervised Learning 93 5.1 INTRODUCTION 93 5.2 CROSS-VALIDATION 96 5.2.1 Overview 96 5.2.2 K-Fold Cross-Validation 97 5.3 K-NEARESTNEIGHBOURSCLASSIFICATION 99 5.4 CLASSIFICATIONANDREGRESSIONTREES 102 Contents (cid:4) ix 5.4.1 Overview 102 5.4.2 Classification Trees 103 5.4.3 Regression Trees 106 5.4.4 Comments 108 5.5 BOOTSTRAP 108 5.6 RANDOMFORESTS 111 5.7 GRADIENTBOOSTING 113 5.7.1 Overview 113 5.7.2 Beer Data 116 5.7.3 Food Data 121 5.8 COMMENTS 126 Chapter 6(cid:4) Unsupervised Learning 129 6.1 INTRODUCTION 129 6.2 PRINCIPALCOMPONENTSANALYSIS 132 6.3 PROBABILISTIC PRINCIPAL COMPONENTS ANALYSIS 135 6.4 EMALGORITHMFORPPCA 137 6.4.1 Background: EM Algorithm 137 6.4.2 E-step 138 6.4.3 M-step 139 6.4.4 Woodbury Identity 140 6.4.5 Initialization 141 6.4.6 Stopping Rule 141 6.4.7 Implementing the EM Algorithm for PPCA 142 6.4.8 Comments 146 6.5 K-MEANSCLUSTERING 148 6.6 MIXTUREOFPROBABILISTICPRINCIPALCOM- PONENTSANALYZERS 151 6.6.1 Model 151 6.6.2 Parameter Estimation 152 6.6.3 Illustrative Example: Coffee Data 161 6.7 COMMENTS 162

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.