Tip:
Highlight text to annotate it
X
Today's question comes from Cardiff, UK.
Tristan Perry asks a fun question.
"Hi Matt.
Could you give any insight into the sort of hardware
and/or server-side software which power a typical
Googlebot--
web crawler--
server?" What a fun question!
So one of the secrets of Google is that rather than
employing these mainframe machines, this heavy iron, big
iron kind of stuff, if you were to go into a Google data
center and look at an example rack, it would look
a lot like a PC.
So there's commodity PC parts.
It's the sort of thing where you'd recognize a lot of the
stuff from having opened up your own computer.
And what's interesting is rather than have like special
Googlebot web crawling servers, we tend to say, OK,
build a whole bunch of different servers that can be
used interchangeably for things like Googlebot, or web
serving, or indexing.
And then we have this fleet, this armada of machines, and
you can deploy it on different types of tasks and different
types of processing.
So hardware wise, they're not exactly the same, but they
look a lot like regular commodity PCs.
And there's no difference between Googlebot servers
versus regular servers at Google.
You might have differences in RAM or hard disk, but in
general, it's the same sorts of stuff.
Now as far as server-side software, there's a little bit
of a joke at Google that says we don't just
build the cars ourselves.
And we don't just build the tires ourselves.
We actually vulcanize the rubber on the tires ourselves.
So we tend to look at everything all the
way down to the metal.
I mean, if you think about it, there's data center
efficiency.
There's power efficiency on the motherboards.
And so if you can sort of keep an eye on everything all the
way down, you can make your stuff a lot more efficient, a
lot more powerful.
You're not wasting things because you use some outside
vendor and it's black box.
So Google tends to use a lot of Linux-based machines,
Linux-based servers.
We've got a lot of Linux kernel hackers.
And we tend to have software that we've built pretty much
from the ground up to do all the
different specialized tasks.
So even to the point of our web servers.
We don't use Apache.
We don't use IIS.
We use something called GWS, which stands for the Google
Web Server.
So by having our own binaries that we've built from our own
stuff and building that stack all the way up, it really
unlocks a lot of efficiency.
It makes sure that there's nothing that you can't go in
and tweak to get performance gains or to
fix if you find bugs.
So that's just a little bit of a view of hardware and
software side as far as what goes behind Googlebot and
crawling the web.