Programming Google App Engine with Python Dan Sanderson Programming Google App Engine with Python by Dan Sanderson Copyright © 2015 Dan Sanderson. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Editors: Meghan Blanchette and Brian Anderson Acquisition Editor: Mike Loukides Production Editors: Colleen Lobner and Kara Ebrahim Copyeditor: Jasmine Kwityn Proofreader: Charles Roumeliotis Indexer: Judy McConville Interior Designer: David Futato Cover Designer: Ellie Volckhausen Illustrator: Rebecca Demarest June 2015: First Edition Revision History for the First Edition 2015-06-17: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491900253 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Programming Google App Engine with Python, the cover image of a four-lined snake, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-49190025-3 [LSI] Preface On the Internet, popularity is swift and fleeting. A mention of your website on a popular news site can bring 300,000 potential customers your way at once, all expecting to find out who you are and what you have to offer. But if you’re a small company just starting out, your hardware and software aren’t likely to be able to handle that kind of traffic. You’ve sensibly built your site to handle the 30,000 visits per hour you’re actually expecting in your first six months. Under heavy load, such a system would be incapable of showing even your company logo to the 270,000 others that showed up to look around. And those potential customers are not likely to come back after the traffic has subsided. The answer is not to spend time and money building a system to serve millions of visitors on the first day, when those same systems are only expected to serve mere thousands per day for the subsequent months. If you delay your launch to build big, you miss the opportunity to improve your product by using feedback from your customers. Building big early risks building something your customers don’t want. Historically, small companies haven’t had access to large systems of servers on day one. The best they could do was to build small and hope that meltdowns wouldn’t damage their reputation as they try to grow. The lucky ones found their audience, got another round of funding, and halted feature development to rebuild their product for larger capacity. The unlucky ones, well, didn’t. These days, there are other options. Large Internet companies such as Amazon.com, Google, and Microsoft are leasing parts of their high-capacity systems by using a pay-per-use model. Your website is served from those large systems, which are plenty capable of handling sudden surges in traffic and ongoing success. And because you pay only for what you use, there is no up- front investment that goes to waste when traffic is low. As your customer base grows, the costs grow proportionally. Google’s offering, collectively known as Google Cloud Platform, consists of a suite of high-powered services and tools: virtual machines in a variety of sizes, multiple forms of reliable data storage, configurable networking, automatic scaling infrastructure, and even the big data analysis tools that power Google’s products. But Google Cloud Platform does more than provide access to Google’s infrastructure. It encapsulates best practices for application architecture that have been honed by Google engineers for their own products. The centerpiece of Google Cloud Platform is Google App Engine, an application hosting service that grows automatically. App Engine runs your application so that each user who accesses it gets the same experience as every other user, whether there are dozens of simultaneous users or thousands. Your application code focuses on each individual user’s experience. App Engine takes care of large-scale computing tasks — such as load balancing, data replication, and fault tolerance — automatically. The scalable model really kicks in at the point where a traditional system would outgrow its first database server. With such a system, adding load-balanced web servers and caching layers can get you pretty far, but when your application needs to write data to more than one place, you face a difficult problem. This problem is made more difficult when development up to that point has relied on features of database software that were never intended for data distributed across multiple machines. By thinking about your data in terms of Cloud Platform’s model up front, you save yourself from having to rebuild the whole thing later. Often overlooked as an advantage, App Engine’s execution model helps to distribute computation as well as data. App Engine excels at allocating computing resources to small tasks quickly. This was originally designed for handling web requests from users, where generating a response for the client is the top priority. Combining this execution model with Cloud Platform’s task queue service, medium-to-large computational tasks can be broken into chunks that are executed in parallel. Tasks are retried until they succeed, making tasks resilient in the face of service failures. The execution model encourages designs optimized for the parallelization and robustness provided by the platform. Running on Google’s infrastructure means you never have to set up a server, replace a failed hard drive, or troubleshoot a network card. You don’t have to be woken up in the middle of the night by a screaming pager because an ISP hiccup confused a service alarm. And with automatic scaling, you don’t have to scramble to set up new hardware as traffic increases. Google Cloud Platform and App Engine let you focus on your application’s functionality and user experience. You can launch early, enjoy the flood of attention, retain customers, and start improving your product with the help of your users. Your app grows with the size of your audience — up to Google-sized proportions — without having to rebuild for a new architecture. Meanwhile, your competitors are still putting out fires and configuring databases. With this book, you will learn how to develop web applications that run on Google Cloud Platform, and how to get the most out of App Engine’s scalable execution model. A significant portion of the book discusses Google Cloud Datastore, a powerful data storage service that does not behave like the relational databases that have been a staple of web development for the past decade. The application model and the datastore together represent a new way of thinking about web applications that, while being almost as simple as the model we’ve known, requires reconsidering a few principles we often take for granted. A Brief History of App Engine If you read all that, you may be wondering why this book is called Programming Google App Engine and not Programming Google Cloud Platform. The short answer is that the capabilities of the platform as a whole are too broad for one book. In particular, Compute Engine, the platform’s raw virtual machine capability, can do all kinds of stuff beyond serving web applications. By some accounts (mine, at least), App Engine started as an early rendition of the Cloud Platform idea, and evolved and expanded to include large-scale and flexible-scale computing. When it first launched in 2008, App Engine hosted web applications written in Python, with APIs for a scalable datastore, a task queue service, and services for common features that lay outside of the “container” in which the app code would run (such as network access). A “runtime environment” for Java soon followed, capable of running web apps based on Java servlets using the same scalable infrastructure. Container-ized app code, schemaless data storage, and service-oriented architecture proved to be not only a good way to build a scalable web app, but a good way to make reliability a key part of the App Engine product: no more pagers. App Engine evolved continuously, with several major functionality milestones. One such milestone was a big upgrade for the datastore, using a new Paxos- based replication algorithm. The new algorithm changed the data consistency guarantees of the API, so it was released as an opt-in migration (including an automatic migration tool). Another major milestone was the switch from isolated request handlers billed by CPU usage to long-running application instances billed by instance uptime. With the upgraded execution model, app code could push “warm-up” work to occur outside of user request logic and exploit local memory caches. Google launched Compute Engine as a separate product, a way to access computation on demand for general purposes. With a Compute Engine VM, you can run any 64-bit Linux-based operating system and execute code written in any language compiled to (or interpreted by) that OS. Apps — running on App Engine or otherwise — can call into Compute Engine to start up any number of virtual machines, do work, and either shut down machines when no longer needed or leave them running in traditional or custom configurations. App Engine and Compute Engine take different approaches to provide different capabilities. But these technologies are already starting to blend. In early 2014, Google announced Managed VMs, a new way to run VM-based code in an App Engine-like way. (This feature is not fully available as I write this, but check the Google Cloud Platform website for updates.) Overall, you’re able to adopt as much of the platform as you need to accomplish your goals, investing in flexibility when needed, and letting the platform’s automaticity handle the rest. This book is being written at a turning point in App Engine’s history. Services that were originally built for App Engine are being generalized for Cloud Platform, and given REST APIs so you can call them from off the platform as well. App Engine development tools are being expanded, with a new universal Cloud SDK and Cloud Console. We’re even seeing the beginnings of new ways to develop and deploy software, with integrated Git-based source code revision control. As with any book about an evolving technology, what follows is a snapshot, with an emphasis on major concepts and long-lasting topics. The focus of this book is building web applications using App Engine and related parts of the platform, especially Cloud Datastore. We’ll discuss services currently exclusive to App Engine, such as those for fetching URLs and sending email. We’ll also discuss techniques for organizing and optimizing your application, using task queues and offline processes, and otherwise getting the most out of Google App Engine. Using This Book Programming Google App Engine with Python covers App Engine’s runtime environment for the Python programming language. The Python runtime environment provides a fast interpreter for the Python language, and includes Python libraries for all of App Engine’s features. It is compatible with many major open source web application frameworks, such as Django and Flask. App Engine supports three other runtime environments: Java, PHP, and Go. Java support includes a complete Java servlet environment, with a JVM capable of running bytecode produced by compilers for Java and other languages. The PHP environment runs a native PHP interpreter with the standard library and many extensions enabled, and is capable of running many off-the-shelf PHP applications such as WordPress and Drupal. With the Go runtime environment, App Engine compiles your Go code on the server and executes it at native CPU speeds. The information contained in this book was formerly presented in a single volume, Programming Google App Engine, which also covered Java. To make it easy to find the information you need for your language, that book has been split into language-specific versions. You are reading the Python version. Programming Google App Engine with Java covers the same material using the Java language, as well as Java-specific topics. We are considering PHP and Go versions of this book as a future endeavor. For now, the official App Engine documentation is the best resource for using these languages on the platform. If you’re interested in seeing versions of this book for PHP or Go, let us know by sending email to [email protected]. The book is organized so you can jump to the subjects that are most relevant to you. The introductory chapters provide a lay of the land, and get you working with a complete example that uses several features. Subsequent chapters are arranged by App Engine’s various features, with a focus on efficient data storage and retrieval, communication, and distributed computation. Project life cycle topics such as deployment and maintenance are also covered.