ebook img

Distributed Computing with Python PDF

171 Pages·2016·7.674 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Distributed Computing with Python

Distributed Computing with Python Harness the power of multiple computers using Python through this fast-paced informative guide Francesco Pierfederici BIRMINGHAM - MUMBAI Distributed Computing with Python Copyright © 2016 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: April 2016 Production reference: 1060416 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78588-969-1 www.packtpub.com Credits Author Project Coordinator Francesco Pierfederici Nikhil Nair Reviewer Proofreader James King Safis Editing Commissioning Editor Indexer Veena Pagare Rekha Nair Acquisition Editor Graphics Aaron Lazar Disha Haria Content Development Editor Production Coordinator Parshva Sheth Melwyn Dsa Technical Editor Cover Work Abhishek R. Kotian Melwyn Dsa Copy Editor Neha Vyas About the Author Francesco Pierfederici is a software engineer who loves Python. He has been working in the fields of astronomy, biology, and numerical weather forecasting for the last 20 years. He has built large distributed systems that make use of tens of thousands of cores at a time and run on some of the fastest supercomputers in the world. He has also written a lot of applications of dubious usefulness but that are great fun. Mostly, he just likes to build things. I would like to thank my wife, Alicia, for her unreasonable patience during the gestation of this book. I would also like to thank Parshva Sheth and Aaron Lazar at Packt Publishing and the technical reviewer, James King, who were all instrumental in making this a better book. About the Reviewer James King is a software developer with a broad range of experience in distributed systems. He is a contributor to many open source projects including OpenStack and Mozilla Firefox. He enjoys mathematics, horsing around with his kids, games, and art. www.PacktPub.com eBooks, discount offers, and more Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub. com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. https://www2.packtpub.com/books/subscription/packtlib Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books. Why subscribe? • Fully searchable across every book published by Packt • Copy and paste, print, and bookmark content • On demand and accessible via a web browser Table of Contents Preface iii Chapter 1: An Introduction to Parallel and Distributed Computing 1 Parallel computing 2 Distributed computing 4 Shared memory versus distributed memory 6 Amdahl's law 9 The mixed paradigm 12 Summary 12 Chapter 2: Asynchronous Programming 13 Coroutines 16 An asynchronous example 22 Summary 28 Chapter 3: Parallelism in Python 29 Multiple threads 30 Multiple processes 37 Multiprocess queues 42 Closing thoughts 44 Summary 45 Chapter 4: Distributed Applications – with Celery 47 Establishing a multimachine environment 47 Installing Celery 49 Testing the installation 52 A tour of Celery 55 More complex Celery applications 57 Celery in production 65 [ i ] Table of Contents Celery alternatives – Python-RQ 67 Celery alternatives – Pyro 70 Summary 77 Chapter 5: Python in the Cloud 79 Cloud computing and AWS 79 Creating an AWS account 80 Creating an EC2 instance 90 Storing data in Amazon S3 99 Amazon elastic beanstalk 103 Creating a private cloud 104 Summary 105 Chapter 6: Python on an HPC Cluster 107 Your typical HPC cluster 107 Job schedulers 109 Running a Python job using HTCondor 111 Running a Python job using PBS 123 Debugging 128 Summary 129 Chapter 7: Testing and Debugging Distributed Applications 131 The big picture 132 Common problems – clocks and time 132 Common problems – software environments 134 Common problems – permissions and environments 135 Common problems – the availability of hardware resources 136 Challenges – the development environment 140 A useful strategy – logging everything 141 A useful strategy – simulating components 143 Summary 144 Chapter 8: The Road Ahead 145 The first two chapters 146 The tools 147 The cloud and the HPC world 148 Debugging and monitoring 150 Where to go next 151 Index 153 [ ii ] Preface Parallel and distributed computing is a fascinating subject that only a few years ago developers in only a very few large companies and national labs were privy to. Things have changed dramatically in the last decade or so, and now everybody can build small- and medium-scale distributed applications in a variety of programming languages including, of course, our favorite one: Python. This book is a very practical guide for Python programmers who are starting to build their own distributed systems. It starts off by illustrating the bare minimum theoretical concepts needed to understand parallel and distributed computing in order to lay the basic foundations required for the rest of the (more practical) chapters. It then looks at some first examples of parallelism using nothing more than modules from the Python standard library. The next step is to move beyond the confines of a single computer and start using more and more nodes. This is accomplished using a number of third-party libraries, including Celery and Pyro. The remaining chapters investigate a few deployment options for our distributed applications. The cloud and classic High Performance Computing (HPC) clusters, together with their strengths and challenges, take center stage. Finally, the thorny issues of monitoring, logging, profiling, and debugging are touched upon. All in all, this is very much a hands-on book, teaching you how to use some of the most common frameworks and methodologies to build parallel and distributed systems in Python. [ iii ]

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.