Google Compute Engine Load Balancing, a Quick Introduction

Hello, everyone. My name is Brian Dorsey, and we're going to talk about load balancing. So as you're probably aware, load balancing is a critical part of nearly every scalable service you might want to run on the internet. And luckily, Google Cloud Platform Compute Engine has a very scalable, powerful load balancer built right in. Basically what you do is you configure a pool of your instances. And the Load Balancer will automatically direct traffic among them, spreading out TCP connections and [? UDP ?] packets amongst all of your instances. As long as they stay healthy, they get traffic. And if they become unhealthy, the Load Balancer will no longer send them traffic until they become healthy again. And you get to define what exactly healthy is for your instances. What happens is Compute Engine will send HTTP requests to your instances. And when they get a 200 response back, it's considered healthy. And if anything else comes back, it's unhealthy. So it's completely under your control. Let's go ahead and take a look at the demo. So what we've got here is an App Engine application. When I hit this Start VMs button, it's made a request to the Compute Engine API, and we're spinning up some instances to run this demo workload. It's a fractal generator. And as the instances come up, they're running a startup script that downloads a go language fractal generator, and spins that up and starts it running. As the instances have gone green here on the demo, that means the instance is up and it's starting to go through its boot process. And as soon as we start seeing check marks, then those are actually running this fractal software. And this is basically intended as a proxy for your application. You can imagine any sort of CPU-heavy application-- this is taking the place of that in the demo. So let me go ahead and show the fractals here. On the left-hand side, we've got a single instance serving this up. And on the right-hand side, right now we have 10 instances. So we can go ahead and add more VMs. And as we zoom around, even though we have new VMs coming up, we see both of them working well. And the one on the right, things are coming in faster because we're actually cooperating, using multiple instances to pull this up. And that's all transparent as far as this client application's concerned. Each side is just hitting one IP address, and the answers are coming back. If something were to go wrong, say we head behind the scenes and we cause a failure in this case, but something happened to one of your instances, we're going to go ahead and get rid of number two. And it's zero-based here, so we should see this one drop out, and there it goes. And we can still zoom around and nothing is changed as far as our client application's concerned. It's still sending and receiving requests, still moving fast. We've added new instances in and we've dropped one out, and all of that's transparent to both our application and all of our clients. So let's come back. Thanks for that. And I also want to stress that this is not just a virtual machine serving this traffic up. This is Google's network infrastructure acting as your load balancer. So our network infrastructure is actually routing your data to your instances. And what that means is, if you have a big spike in traffic, you don't have to wait for virtual machines to warm up in order to handle that traffic. Google's network infrastructure will handle it for you. So please give it a try. We'll have some links to the docs and other details down below the video. So take care, and happy load balancing.