What Hardware And Software Powers Googlebot?

Today's question comes from Cardiff, UK. Tristan Perry asks a fun question. "Hi Matt. Could you give any insight into the sort of hardware and/or server-side software which power a typical Googlebot-- web crawler-- server?" What a fun question! So one of the secrets of Google is that rather than employing these mainframe machines, this heavy iron, big iron kind of stuff, if you were to go into a Google data center and look at an example rack, it would look a lot like a PC. So there's commodity PC parts. It's the sort of thing where you'd recognize a lot of the stuff from having opened up your own computer. And what's interesting is rather than have like special Googlebot web crawling servers, we tend to say, OK, build a whole bunch of different servers that can be used interchangeably for things like Googlebot, or web serving, or indexing. And then we have this fleet, this armada of machines, and you can deploy it on different types of tasks and different types of processing. So hardware wise, they're not exactly the same, but they look a lot like regular commodity PCs. And there's no difference between Googlebot servers versus regular servers at Google. You might have differences in RAM or hard disk, but in general, it's the same sorts of stuff. Now as far as server-side software, there's a little bit of a joke at Google that says we don't just build the cars ourselves. And we don't just build the tires ourselves. We actually vulcanize the rubber on the tires ourselves. So we tend to look at everything all the way down to the metal. I mean, if you think about it, there's data center efficiency. There's power efficiency on the motherboards. And so if you can sort of keep an eye on everything all the way down, you can make your stuff a lot more efficient, a lot more powerful. You're not wasting things because you use some outside vendor and it's black box. So Google tends to use a lot of Linux-based machines, Linux-based servers. We've got a lot of Linux kernel hackers. And we tend to have software that we've built pretty much from the ground up to do all the different specialized tasks. So even to the point of our web servers. We don't use Apache. We don't use IIS. We use something called GWS, which stands for the Google Web Server. So by having our own binaries that we've built from our own stuff and building that stack all the way up, it really unlocks a lot of efficiency. It makes sure that there's nothing that you can't go in and tweak to get performance gains or to fix if you find bugs. So that's just a little bit of a view of hardware and software side as far as what goes behind Googlebot and crawling the web.