ebook img

Practical Load Balancing: Ride the Performance Tiger PDF

270 Pages·2012·32.1 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Practical Load Balancing: Ride the Performance Tiger

> m o ok.c o b e w o w w. w w < ok o B e w! o W m o d fr a o nl w o D For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. Contents at a Glance About the Authors ................................................................................................... xvii About the Technical Reviewers ............................................................................. xviii Special Thanks to serverlove .................................................................................. xix Acknowledgments .................................................................................................... xx Preface .................................................................................................................... xxi ■ Chapter 1: Introduction ........................................................................................... 1 ■ Chapter 2: How Web Sites Work .............................................................................. 9 ■ Chapter 3: Content Caching: Keeping the Load Light ............................................ 29 ■ Chapter 4: DNS Load Balancing ............................................................................. 53 ■ Chapter 5: Content Delivery Networks .................................................................. 71 ■ Chapter 6: Planning for Performance and Reliability ............................................ 93 ■ Chapter 7: Load Balancing Basics ....................................................................... 109 ■ Chapter 8: Load Balancing Your Web Site ........................................................... 117 ■ Chapter 9: Load Balancing Your Database .......................................................... 135 ■ Chapter 10: Network Load Balancing .................................................................. 153 ■ Chapter 11: SSL Load Balancing .......................................................................... 175 ■ Chapter 12: Clustering for High Availability ........................................................ 193 ■ Chapter 13: Load Balancing in the Cloud ............................................................. 211 ■ Chapter 14: IPv6: Implications and Concepts ...................................................... 225 ■ Chapter 15: Where to Go Next… .......................................................................... 235 Index ....................................................................................................................... 243 iv C H A P T E R 1 Introduction The Internet, and in particular the World Wide Web, have effectively leveled the playing field for businesses and individuals around the world. Do you have a great idea for a web site or a service? Even the more advanced ideas can be relatively easily and cheaply realized without much in the way of initial outlay. You can get shared hosting for a few pennies each month, and dedicated servers (virtual or otherwise) are getting to the point where they are cheaper now than managed web hosting was just a few years ago. This is all well and good, and being able to start out despite having just a few coins to rub together is certainly a great achievement. But what do you do when you want to take your application to the next step? What if it really starts to take off and you find you have the new Facebook or Amazon on your hands. What do you do then? The Problem Imagine you've come up with the next big thing. It’s so awesome and awe-inspiring that people are going to be talking about it for years. You’re pretty certain that you’ll be Time magazine’s Person of the Year. No doubt you’ll soon be on a first name basis will Bill Gates and Mark Zuckerberg. The future is bright! You’ve spent months getting things ready for launch. You’ve tested and retested. You’ve had friends and family visit the site and even run a few beta tests. You received rave reviews from all of them. The web site looks gorgeous, it’s simple to use, and everybody loves it. You know there is nothing more that you can do to improve the site so you decide that the time is right. You reach over and press the magic button: your web site is live and you sit back and wait for the acclaim that is rightfully yours! At first, all goes well. You’ve had some feedback and people are happy. They’re telling their friends; more people are signing up. Now would be an appropriate time to start rubbing your hands in glee! But something goes wrong. Someone just posted your web site’s address to Slashdot. All of a sudden everyone wants to see your site. You were serving a good 10 pages per second previously, but now your server is being hammered for ten thousand! You weren’t expecting this! Not only is the server not specced to support this sort of load, but you notice that previously fast requests are starting to back up as the database takes a hammering. People are starting to get error pages, some are getting timeouts, and some can’t even get that far! Within 30 minutes it’s all over. Slashdot has now recorded that your site can’t take the load and everyone is saying how bad the site is. “It’s not stable,” they say. “Half the time it doesn’t load!” they complain. “What a complete waste of time!” they whine. It’s so unfair! Everything was going so well! This can’t be happening! All that time, effort, and money for nothing! 1 CHAPTER 1 ■ INTRODUCTION ■ Note The terms slashdotted and slashdotting have become synonymous with massive demand that no one is able to counter. The full force of Slashdot has been able to shutter the web sites of even the biggest and best funded companies. Although you might not be able to withstand such an onslaught, you can prepare your site for a potential spike in traffic without having to break the bank. Your web site will be more robust and snappier for your visitors. And if you are lucky enough to capture the attention of Slashdot, you will be in a much better position to weather the storm! The Solution Okay, that example is somewhat contrived, but hey, it was fun to write and it’s based on fact. Don’t forget that WWW stands for World Wide Web and there are millions of people using it all day, every day. If you attract just infinitesimally tiny fraction of those people at the same time, you can expect an awful lot of visitors. Most web sites respond well when one or two people visit them, but have you ever tried simulating 100,000 visitors and then tried to look at your site? These sorts of things do happen, and most of the time people get caught out. (Peter here: For example, I was consulting for a large company that sells various types of fluffy toys online. Now, I’m not entirely sure how you become a large company by selling fluffy toys but I guess they know something that I don’t! Anyway, the problem was that like most businesses they had busy and quiet periods. On average they would take around 2,000 orders per week. However, at certain times, such as Christmas and Valentine’s Day, their orders would go through the roof, often to 2,000 per day! When they were really busy, they were hitting five orders per second!) Well, their server couldn’t really handle this much load. With that many orders, you can imagine how many web pages and graphics needed to be generated and sent back to customers. For every order that was placed, dozens of people could have been browsing the web site. During the normal periods, their server really had no problem coping with the load. However, during the high periods, the server would start to have problems. The upshot? Abandoned orders. People who were simply browsing found the site so slow that they gave up and went elsewhere. Those that got through the order process only to get a timeout on the payment page (and we all know how annoying that is) simply gave up and closed the browser. The obvious solution would be to upgrade the system so it could handle this much load. However, they looked into the pricing and found that such a machine would be many times the price of their current server. It was decided that rather than buy one big machine, they would buy two machines of a similar specification as the existing server and then use a load balancing solution. This solution had quite a few benefits. First, it was significantly cheaper to buy two more reasonable specced machines than it would have been to buy a much higher specced server that could handle the load all by itself. It also meant more capacity for handling future growth because generally speaking, two servers provide more resources than a single server. If a single server failed, they could still fall back to using one server. This also meant that they could take a server down for maintenance without having to take down the web site. These sorts of solutions are quite common and are a very effective way for deploying critical web sites where performance and reliability are key. This book will get you up to speed on the basics and will give you all you need to know in order to get your web sites riding the performance tiger! 2 CHAPTER 1 ■ INTRODUCTION What Is Load Balancing? Ok, you’ve seen what load balancing can do, but what actually is it? The term has been around for many years and chances are that anyone who hasn’t kept up with development of the Web will think it has to do with electricity and power stations. Before you laugh, they’re quite correct, so this is a great place for you to start your load-balancing journey! No really, it is! Load Balancing, Old Style Most things in the home use electricity and that electricity comes from the power grid. To generate electricity, we need power stations of some form or another. That much is fairly obvious and straightforward. However, at different times of the day the requirement for power changes. In the morning, when people are getting up and getting ready for work, there is a large demand. People are turning on kettles, toasters, ovens, and other high usage appliances. The grid needs to make sure that there is enough power available for everyone to do this. But what happens when people go to work? Suddenly not so much power is needed, but you can’t just turn off a power station. It’s also quite possible that when everyone starts making breakfast, the load placed on the grid is higher than what the individual power stations can supply. Fortunately, it’s load balancing to the rescue. By storing power when it’s not being used during the off-peak times, the grid is able to provide higher amounts of power during the on-peak times. So, how is this like load balancing in the computing world? Well, it all comes down to having finite resources and trying to make the best possible use of them. You have the goal of making your web sites fast and stable; to do that you need to route your requests to the machines best capable of handling them. Load Balancing, New Style In computing terms, you’re going to be doing something similar. People put load on your web site by making lots of requests to it. This is a normal state of affairs and is, after all, why you have a web site in the first place. No one wants a web site that nobody looks at. But what happens when people turn on their appliances and start stressing your server? At this point things can go bad; if the load is too high (because too many people are visiting), your web site is going to take a performance hit. It’s going to slow down, and with more and more users, it will get slower and slower until it fails completely. Not what you want. To get around this, you need more resources. You can either buy a bigger machine to replace your current server (scale up) or you can buy another small machine to work alongside your existing server (scale out). Scaling Up Scaling up is quite common for applications that just need more power. Maybe the database has grown so large that it won’t fit in memory like it used to. Perhaps the disks are getting full or the database needs to handle more requests than it used to and needs more processing power. Databases are generally a good example for scaling up because traditionally they had severe problems when run on more than one machine. This is because many of the things you can take for granted on a single machine simply break when you try to make them work on more than one. For 3 CHAPTER 1 ■ INTRODUCTION example, how do you share tables across the machines efficiently? It’s a really hard problem to solve, which is why several new databases, such as MongoDB and CouchDB, have been designed to work in a very different way. Scaling up can be pretty expensive, though. Usually when you get above a certain specification, servers suddenly leap in price. Now the machine comes with a high spec RAID controller, enterprise grade disks, and a new type of processor (that mysteriously looks and performs like the previous one but has a much higher price tag). If you’re just upgrading components, it might be cheaper to scale up rather than out, but you will most likely find that you will get less bang for your buck this way. That said, if all you need is an extra couple of gigabytes of RAM or some more disk space, or you just need to boost the performance of a particular application, this might be your best solution. Scaling Out This is where things start to get interesting and the reason why you actually picked up this book. Scaling out is when you have two or three machines rather than a single machine. An issue with scaling up is that at some point you hit a limit that you can’t cross. There is only so much processing power and memory a single machine can hold. What happens if you need more than that? Many people will tell you you’re in an envious position if you have so many visitors that a single machine just can’t take the load. This is a nice problem to have, believe it or not! The great thing about scaling out is that you can simply keep adding machines. Sure, at some point you will start to hit space and power issues, but you will certainly have more compute power by scaling out than you could get by scaling up. In addition, when you scale out, you have more machines. So if one machine were to fail, you still have other machines that can take on the load. When you scale up, if that one machine fails, then everything fails. Scaling out does have one problem and it’s a big one. The scenario is this: you operate a single cohesive web site or web application and you have three machines. How do you make all those three machines operate together so that they give the impression of a single machine? Load balancing! Load Balancing, Finally Yes, you’re on page four and we haven’t as yet really talked about load balancing, which might seem somewhat odd considering that’s what this book is all about. But fear not, we’re getting into the juicy stuff now! And you’re also going to look at caching, which (honestly) goes hand in hand with load balancing and can give you an insanely big performance boost! But first, back to load balancing. As mentioned, the biggest challenge in load balancing is trying to make many resources appear as one. Specifically, how do you make three servers look and feel like a single web site to your customer? What Makes the Web Tick? The first stop in this journey is to look at how the Web holds together. When you click the Go button on your browser, what happens under the covers? This book will go into quite a bit of detail, even looking briefly at the TCP (Transmission Control Protocol, the protocol that powers the web) layer. It has been our experience that while someone might be able to produce the most awe-inspiring web applications, they’re not necessarily very clued up on the lower level things that make it all possible. In reality this isn’t an issue because, by design, you don’t need to know the innards of the Internet in order to write kickass 4 CHAPTER 1 ■ INTRODUCTION software. However, if you want to make your software scream past the competition at high speed, you need a much better appreciation of how it all hangs together. Not interested? Don’t worry. If you really can’t bring yourself to read Chapter 2, you can still get a great deal out of the book. You will be able to make your web sites go faster and you will be able to make them more resilient and reliable. But you risk not knowing exactly why this is happening and that means you might end up creating web sites that have the same problems over and over again. Caching: Warp Drive for Your Web Site Chapter 3 gets you started on caching and shows how even the simplest of techniques can make your web site simply scream. If you don’t get the sci-fi reference in the subtitle, it just means that your web application is going to be very very fast indeed! We start with the basics: getting your application to tell the world what it can and can’t cache, why this is important, and why you need to be somewhat careful with this approach as you might end up caching things that you hadn’t intended. Next, you look at how you can get more control over what you’re caching and how. This is where the power really starts, but amazingly enough it’s really easy to do. In our experience, it is this part of caching that pays out in spades. There is a reason web sites like Facebook and Twitter use these techniques, and now you’ll be joining them! Make it so! Load Balancing with DNS Chapter 4 takes on the most under-appreciated form of load balancing around. Load balancing with DNS (Domain Name System, the phonebook of the Internet) lacks some of the finesse and power of the other techniques but it’s far and away the easiest solution to set up. In seconds you can have load balancing up and running. Not too shabby! We’ll also introduce some of the problems you’re likely to walk into when you do any form of load balancing, not necessarily specific to using DNS. We’ll show you some potential solutions for the problems and where to start looking if you are doing something a bit different. You’ll get to appreciate how DNS works and why it does what it does, and then you’ll take a look at any-cast DNS, a technique used for content delivery networks and a very powerful tool indeed for load balancing across entire countries! Content Delivery Networks Following on from the last chapter, Chapter 5 looks at content delivery networks (CDN) and how they can help you boost performance. This isn’t something you can build yourself; fortunately, such networks already exist, and you can take advantage of them. Want to get your downloads as close to your customers as possible? Want to serve your static resources (such as images, JavaScript, and CSS files) without putting load on your already stressed servers? A CDN could be just what you’re looking for! We cover Rackspace Cloud as we have been really impressed with their cloud offerings. However, the principles can be applied with any CDN provider so you are in no way restricted to one provider. In fact, since we’ve been using Rackspace Cloud, they’ve changed their CDN provider at least twice without affecting how the system is used by developers and end users. How cool is that? 5 CHAPTER 1 ■ INTRODUCTION CDNs are big business and they are becoming critical to web sites that specialize in moving content around the world. If you ever see “CDN” in a web site name (take a look at your status bar next time you load a page from Facebook) you will know that they are leveraging this technology to give their sites a big performance boost. Now you can do the same! Proper Planning Prevents Pretty Poor Performance There is a reason why the 6P principle is taught to military officers around the world. If you have a plan, even if it’s not a great one, you stand a much greater chance of success than if you just sat down and tried to wing it. Chapter 6 looks at some basic principles that you can follow to make sure your web applications are nicely layered, easy to scale, and grow with demand. There are no magic tricks here, just good old- fashioned advice from people who have sometimes forgotten it and dug themselves into a nice deep hole. It’s dark down there and it’s often not easy to get out without a lot of hard work and effort. If you follow our advice (or just some of it) you’ll be going a long way towards avoiding that particular pit… The Essentials Before we get into heavy-duty load balancing techniques, we provide a quick look at some essential ideas that you need to know before delving in. Chapter 7 is a short chapter—just a primer that will make your journey through the remaining chapters that much easier. We also look at some of the more exotic concepts of load balancing and even though we won’t go into those in a great deal of depth, the information will prevent you tripping up if you come across some content on the Web that mentions them. HTTP Load Balancing Chapter 8 is where the fun really starts. At this stage your site is so busy it needs at least two servers to keep things running smoothly. Don’t worry; we’ve got you covered! First, though, we’ll look at how you can optimize your individual web servers to get the most out of them before you load balance them. We compare Apache, the most common web server on the planet, with Nginx, one of the fastest, meanest web servers around. Can Nginx really give your site a performance boost? If so, when is it best to use one or the other? This chapter bares all! We also look at some of the features that improve performance such as enabling compression, disabling DNS lookups for logging, removing unneeded modules, and much more! Load Balancing Your Database The database is historically the slowest component of any web application. It doesn’t matter how fast you make the machine or how much tuning you do, when compared to the speed you can serve dynamic HTML, databases just can’t keep up. When you need more speed or can’t tolerate database failure (and who can?) you need to do some database clustering. This chapter covers how to create a MySQL cluster and how to load balance across it. 6 CHAPTER 1 ■ INTRODUCTION Load Balancing Your Network Connection What do you do when 1GB/sec of network bandwidth isn’t enough and you don’t want to take out a mortgage to pay for a 10GB-capable network? You take two 1GB connections and use them as one! Do you want redundancy? Do you want to double your throughput? This chapter will take you through how to get the most out of your network infrastructure. We show you the various ways that the network can be configured to give you the fastest and most reliable connection to the network possible! SSL Load Balancing Secure web pages are the core of e-commerce and all modern interactive web applications. These days if the padlock isn’t golden, people won’t be using your web site. The problem is that handling 10,000 SSL connections can be somewhat intensive for an application server. In fact, you may well find that you’re spending more time handling the SSL connections than executing your application! Chapter 11 starts off by giving an overview of how PKI encryption works and why it’s so important (and resource intensive) to use it. We show you how to generate your own keys and certificate requests so that you can easily get your hands on a trusted certificate from one of the many certificate authorities out there today. We will also show you how to generate and sign your own certificates so you can begin testing immediately without having to pay for a trusted certificate. We’ll then show you how to handle SSL termination with Nginx and how to spread the load across more than one machine. We also touch on some security issues and how you can use some simple settings to help improve the security of your web application. Clustering for High Availability It’s all well and good having a load balancer to spread the load but now you have introduced a single point of failure. If your load balancing goes down, it doesn’t matter that the 10 servers behind it are healthy; nothing will be able to get to them. In Chapter 12 we discuss solutions to make sure that this doesn’t happen by building a cluster. If one load balancer fails, the other will take over the load. Load Balancing in the Cloud The cloud is the next big thing; with it a whole host of new challenges will be joining the current list that you have to deal with. Chapter 13 will introduce the cloud, what makes it tick, and the things you need to know when looking at the performance of your application. We discuss why not all cloud providers were born equal and why what works well for one web site might not be the best solution for another. IPv6: Implementation and Concepts The Web is predominantly IPv4-based but unfortunately we’ve pretty much run out of available IP addresses. Depending on who you ask, we have already run out. IPv6 is the replacement and it’s already being rolled out across the world. How does IPv6 work and how will it affect you? You will learn some of the benefits and problems you can expect to deal with when IPv6 becomes mainstream. For many uses, IPv6 is not significantly or noticeably different from IPv4, but there are some gotchas, especially in the applications discussed throughout the book. This chapter will get you up to speed on applying what you’ve learned in the other chapters to the world of IPv6. 7

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.