Foundations of Scalable Systems Designing Distributed Architectures With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. Ian Gorton Foundations of Scalable Systems by Ian Gorton Copyright © Ian Gorton. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Acquisitions Editor: Melissa Duffield Development Editor: Virginia Wilson Production Editor: Daniel Elfanbaum Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea August 2022: First Edition Revision History for the Early Release 2021-05-10: First Release 2021-06-22: Second Release 2021-09-08: Third Release 2021-10-04: Fourth Release 2021-11-19: Fifth Release See http://oreilly.com/catalog/errata.csp?isbn=9781098106065 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Foundations of Scalable Systems, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-10599-0 [LSI] Preface A NOTE FOR EARLY RELEASE READERS With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at [email protected]. This book is built around the thesis that the ability of software systems to operate at scale is increasingly a driving system quality. As our world becomes more interconnected, this characteristic will only accelerate. Hence the goal of the book is to provide the reader with the core knowledge of distributed and concurrent systems. It will also introduce a collection of software architecture approaches and distributed technologies that can be used to build scalable systems. Why Scalability? The pace of change in our world is daunting. Innovations appear daily, creating new capabilities for us all to interact with, conduct business, be entertained, end pandemics. The fuel for much of this innovation is software, written by veritable armies of developers in major internet companies, crack small teams in startups, and all shapes and sizes of teams in between. Delivering software systems that are responsive to user needs is difficult enough, but it becomes an order of magnitude more difficult to do for systems at scale. We all know of systems that fail suddenly when exposed to unexpected high loads - such situations are minimally bad publicity for organizations, and at worst can lose jobs and destroy companies. Software is unlike physical systems in that it’s amorphous—its physical form (1’s and 0’s) bears no resemblance to its actual capabilities. We’d never expect to transform a small village of 500 people into a city of 10 million overnight. But we sometimes expect our software systems to suddenly handle 1000x the number of requests they were designed for. Not surprisingly, the outcomes are rarely pretty. Who This Book Is For The major target audience for this book is software engineers and architects who have no or limited experience with distributed, concurrent systems. They need to deepen both their theoretical and practical design knowledge in order to meet the challenges of building larger scale, typically Internet- facing applications. Much of the content of this book has been developed in the context of an advanced undergraduate/graduate course at Northeastern University. It has proven a very popular and effective approach for equipping students with the knowledge and skills needed to launch their careers with major Internet companies. Additional materials on the book web site are available to support educators who wish to use the book for their course. What You Will Learn This book covers the landscape of concurrent and distributed systems through the lens of scalability. While it’s impossible to totally divorce scalability from other architectural qualities, scalability is the main focus of discussion. Of course, other qualities necessarily come in to play, with performance, availability and consistency regularly raising their heads. Building distributed systems requires some fundamental understanding of distribution and concurrency - this knowledge is a recurrent theme throughout this book. It’s needed because at their core, there are two problems in distributed systems that make them complex, as I describe below. First, although systems operate perfectly correctly nearly all the time, an individual part of the system may fail at any time. When a component fails (hardware crash, network down, bug in server), we have to employ techniques that enable the system as a whole to continue operations and recover from failures. And any distributed system will experience component failure, often in weird and mysterious and unanticipated ways. Second, creating a scalable distributed system requires the coordination of multiple moving parts. Each component of the system needs to keep its part of the bargain and process requests as quickly as possible. If just one component causes requests to be delayed, the whole system may perform poorly and even eventually crash. To deal with these problems there is a rich deep body of literature available to draw on. And luckily for us engineers, there’s a rich, extensive collection of technologies that are designed to help us build distributed systems that are tolerant to fail and scalable. These technologies embody theoretical approaches and complex algorithms that are incredibly hard to build correctly. Using these platform level, widely applicable technologies, our applications can stand on the shoulders of giants, enabling us to build sophisticated business solutions. Specifically, readers of this book will learn: The fundamental characteristics of distributed systems, including state management, time coordination, concurrency, communications and coordination Architectural approaches and supporting technologies for building scalable, robust services How distributed databases operate and can be used to build scalable distributed systems Architectures and technologies such as Apache Kafka and Flink for building streaming, event-based systems Part I. Scalability in Modern Software Systems A NOTE FOR EARLY RELEASE READERS With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at [email protected]. The first four chapters in Part 1 of this book motivate the need for scalability as a key architectural attribute in modern software systems. The chapters provide broad coverage of the basic mechanisms for achieving scalability, the fundamental characteristics of distributed systems, and an introduction to concurrent programming. This knowledge lays the foundation for what follows, and if you are new to the areas of distributed, concurrent systems, you’ll need to spend some time on these four chapters. They will make the rest of the book much easier to digest. Chapter 1. Introduction to Scalable Systems A NOTE FOR EARLY RELEASE READERS With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. This will be the 1st chapter of the final book. If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at [email protected]. The last 20 years have seen unprecedented growth in the size, complexity and capacity of software systems. This rate of growth is hardly likely to slow in the next 20 years – what these future systems will look like is close to unimaginable right now. The one thing we can guarantee is that more and more software systems will need to be built with constant growth - more requests, more data, more analysis - as a primary design driver. Scalable is the term used in software engineering to describe software systems that can accommodate growth. In this chapter I’ll explore what precisely is meant by the ability to scale – known, not surprisingly, as scalability. I’ll also describe a few examples that put hard numbers on the capabilities and characteristics of contemporary applications and give a brief history of the origins of the massive systems we routinely build today. Finally, I’ll describe two general principles for achieving scalability, namely replication and optimization, that will recur in various forms throughout the