ebook img

Learn Amazon SageMaker: A guide to building, training, and deploying machine learning models for developers and data scientists, 2nd Edition PDF

554 Pages·2021·45.23 MB·english
by  Simon
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learn Amazon SageMaker: A guide to building, training, and deploying machine learning models for developers and data scientists, 2nd Edition

Learn Amazon SageMaker Second Edition A guide to building, training, and deploying machine learning models for developers and data scientists Julien Simon BIRMINGHAM—MUMBAI Learn Amazon SageMaker Second Edition Copyright © 2021 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Publishing Product Manager: Ali Abidi Senior Editor: David Sugarman Content Development Editor: Joseph Sunil Technical Editor: Devanshi Ayare Copy Editor: Safis Editing Project Coordinator: Aparna Nair Proofreader: Safis Editing Indexer: Pratik Shirodkar Production Designer: Joshua Misquitta First published: August 2020 Second published: November 2021 Production reference: 2191121 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-80181-795-0 www.packt.com Contributors About the author Julien Simon is a principal developer advocate for AI and Machine Learning (ML) at Amazon Web Services (AWS). He focuses on helping developers and enterprises bring their ideas to life. He frequently speaks at conferences, blogs on the AWS Blog, as well as on Medium, and he also runs an AI/ML podcast. Prior to joining AWS, Julien served as the CTO/VP of engineering in top-tier web start- ups over a period of 10 years, where he led large software and ops teams in charge of thousands of servers worldwide. In the process, he fought his way through a wide range of technical, business, and procurement issues, which helped him gain a deep understanding of physical infrastructure, its limitations, and how cloud computing can help. About the reviewers Antje Barth is a principal developer advocate for AI and ML at AWS, based in Düsseldorf, Germany. Antje is the co-author of the O'Reilly book, Data Science on AWS, the co-founder of the Düsseldorf chapter of Women in Big Data, and frequently speaks at AI and ML conferences and meetups around the world. She also chairs and curates content for O'Reilly AI Superstream events. Previously, Antje was an engineer at Cisco and MapR, focused on data center technologies, cloud computing, big data, and AI applications. Brent Rabowsky is a principal data science consultant at AWS with over 10 years' experience in the field of ML. At AWS, he leverages his expertise to help AWS customers with their data science projects. Prior to AWS, he joined Amazon.com on an ML and algorithms team and previously worked on conversational AI agents for a government contractor and a research institute. He has also served as a technical reviewer of the books Data Science on AWS, by Chris Fregly and Antje Barth, published by O'Reilly, and SageMaker Best Practices, published by Packt. Mia Champion is a HealthAI leader passionate about transformative technologies and strategic markets in the areas of life sciences, healthcare, ML/AI, and cloud computing. She has both a technical and entrepreneurial skillset that includes experience as a principal research scientist, cloud computing architect and developer, new business developer, and business strategist. Table of Contents Preface Section 1: Introduction to Amazon SageMaker 1 Introducing Amazon SageMaker Technical requirements 4 Setting up Amazon Exploring the capabilities SageMaker Studio 15 of Amazon SageMaker 4 Onboarding to Amazon SageMaker Studio 16 The main capabilities of Amazon SageMaker 5 Onboarding with the quick start procedure 16 The Amazon SageMaker API 7 Deploying one-click solutions Setting up Amazon and models with Amazon SageMaker on your SageMaker JumpStart 21 local machine 10 Deploying a solution 22 Installing the SageMaker SDK with virtualenv 10 Deploying a model 25 Installing the SageMaker SDK Fine-tuning a model 28 with Anaconda 12 Summary 31 A word about AWS permissions 14 2 Handling Data Preparation Techniques Technical requirements 34 Labeling data with Amazon SageMaker Ground Truth 34 vi Table of Contents Using workforces 35 Exporting a SageMaker Data Creating a private workforce 36 Wrangler pipeline 62 Uploading data for labeling 39 Running batch jobs with Creating a labeling job 39 Amazon SageMaker Labeling images 44 Processing 63 Labeling text 46 Discovering the Amazon Transforming data with SageMaker Processing API 64 Amazon SageMaker Data Processing a dataset with Wrangler 49 scikit-learn 64 Processing a dataset with your Loading a dataset in SageMaker own code 72 Data Wrangler 50 Transforming a dataset in Summary 73 SageMaker Data Wrangler 57 Section 2: Building and Training Models 3 AutoML with Amazon SageMaker Autopilot Technical requirements 78 Using the SageMaker Discovering Amazon Autopilot SDK 96 SageMaker Autopilot 78 Launching a job 97 Analyzing data 79 Monitoring a job 98 Feature engineering 80 Cleaning up 100 Model tuning 80 Diving deep on Using Amazon SageMaker SageMaker Autopilot 100 Autopilot in SageMaker The job artifacts 100 Studio 81 The data exploration notebook 102 Launching a job 81 The candidate generation Monitoring a job 86 notebook 103 Comparing jobs 89 Deploying and invoking a Summary 107 model 94 Table of Contents vii 4 Training Machine Learning Models Technical requirements 110 Preparing data 116 Discovering the built- Configuring a training job 119 in algorithms in Amazon Launching a training job 121 SageMaker 110 Deploying a model 123 Cleaning up 124 Supervised learning 110 Unsupervised learning 111 Working with more A word about scalability 112 built-in algorithms 124 Training and deploying Regression with XGBoost 125 models with built-in Recommendation with algorithms 112 Factorization Machines 127 Using Principal Component Understanding the Analysis 135 end-to-end workflow 113 Detecting anomalies with Using alternative workflows 114 Random Cut Forest 137 Using fully managed infrastructure 114 Summary 143 Using the SageMaker SDK with built-in algorithms 116 5 Training CV Models Technical requirements 146 Working with RecordIO files 157 Discovering the CV built- Working with SageMaker in algorithms in Amazon Ground Truth files 163 SageMaker 146 Using the built-in CV Discovering the image algorithms 165 classification algorithm 146 Training an image Discovering the object classification model 165 detection algorithm 147 Fine-tuning an image Discovering the semantic classification model 170 segmentation algorithm 148 Training an object detection Training with CV algorithms 149 model 172 Preparing image datasets 150 Training a semantic segmentation model 175 Working with image files 150 Summary 181 viii Table of Contents 6 Training Natural Language Processing Models Technical requirements 184 Preparing data for word vectors with BlazingText 196 Discovering the NLP built- in algorithms in Amazon Preparing data for topic modeling with LDA and NTM 197 SageMaker 184 Using datasets labeled with Discovering the BlazingText SageMaker Ground Truth 203 algorithm 185 Discovering the LDA algorithm 185 Using the built-in Discovering the NTM algorithm 186 algorithms for NLP 205 Discovering the seq2sea Classifying text with BlazingText 205 algorithm 187 Computing word vectors with Training with NLP algorithms 188 BlazingText 207 Using BlazingText models Preparing natural with FastText 208 language datasets 188 Modeling topics with LDA 210 Preparing data for classification Modeling topics with NTM 214 with BlazingText 189 Preparing data for classification Summary 218 with BlazingText, version 2 193 7 Extending Machine Learning Services Using Built-In Frameworks Technical requirements 220 Putting it all together 233 Discovering the built-in Running your framework frameworks in Amazon code on Amazon SageMaker 220 SageMaker 234 Running a first example with Using the built-in XGBoost 221 frameworks 238 Working with framework Working with TensorFlow containers 225 and Keras 239 Training and deploying locally 226 Working with PyTorch 242 Training with script mode 227 Working with Hugging Face 245 Understanding model Working with Apache Spark 253 deployment 229 Managing dependencies 231 Summary 260 Table of Contents ix 8 Using Your Algorithms and Code Technical requirements 262 Building a fully Understanding how custom container for R 277 SageMaker invokes Coding with R and plumber 278 your code 262 Building a custom container 280 Customizing an existing Training and deploying a framework container 265 custom container on SageMaker 281 Setting up your build environment on EC2 266 Training and deploying Building training and inference with your own code containers 266 on MLflow 282 Using the SageMaker Installing MLflow 282 Training Toolkit with Training a model with scikit-learn 270 MLflow 283 Building a SageMaker Building a fully custom container with MLflow 285 container for scikit-learn 272 Training with a fully custom Building a fully custom container 272 container for SageMaker Deploying a fully custom Processing 289 container 274 Summary 291 Section 3: Diving Deeper into Training 9 Scaling Your Training Jobs Technical requirements 296 Deciding when to scale 298 Understanding when Deciding how to scale 299 and how to scale 296 Scaling a BlazingText training job 300 Understanding what scaling means 296 Monitoring and profiling Adapting training time to training jobs with Amazon business requirements 297 SageMaker Debugger 304 Right-sizing training Viewing monitoring and profiling infrastructure 297 information in SageMaker Studio 304

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.