Cloaking

MATT CUTTS: Hi, everybody. It's Matt Cutts. And we're back to talk a little bit about cloaking today. A lot of people have questions about cloaking. What exactly is it? How does Google define it? Why is it high risk behavior? All those sorts of things. And there's a lot of HTML documentation. We've done a lot of blog posts. But I wanted to sort of do the definitive cloaking video, and answer some of those questions, and give people a few rules of thumb to make sure that you're not in a high risk area. So first off, what is cloaking? Cloaking is essentially showing different content to users than to Googlebot. So imagine that you have a web server right here. And a user comes and asks for a page. So here's your user. You give him some sort of page. Everybody's happy. And now, let's have Googlebot come and ask for a page as well. And you give Googlebot a page. Now in the vast majority of situations, the same content goes to Googlebot and to users. Everybody's happy. Cloaking is when you show different content to users than to Googlebot. And it's definitely high risk. That's a violation of our quality guidelines. If you do a search for quality guidelines on Google, you'll find a list of all the stuff-- a lot of auxiliary documentation about how to find out whether you're in a high risk area. But let's just talk through this a little bit. Why do we consider cloaking bad, or why does Google not like cloaking? Well, the answer is sort of in the ancient days of search engines, when you'd see a lot of people do really deceptive or misleading things with cloaking. So for example, when Googlebot came, the web server that was cloaking might return a page all about cartoons-- Disney cartoons, whatever. But when a user came and visited the page, the web server might return something like ***. And so if you do a search for Disney cartoons on Google, you'd get a page that looked like it would be about cartoons, you'd click on it, and then you'd get ***. That's a hugely bad experience. People complain about it. It's an awful experience for users. So we say that all types of cloaking are against our quality guidelines. So there's no such thing as white hat cloaking. Certainly, when somebody's doing something especially deceptive or misleading, that's when we care the most. That's when the web spam team really gets involved. But any type of cloaking is against our guidelines. OK. So what are some rules of thumb to sort of save you the trouble or help you stay out of a high risk area? One way to think about cloaking is, almost take the page, like you Wget it or you cURL it. You somehow fetch it, and you take a hash of that page. So take all the different content and boil it down to one number. And then you pretend to be Googlebot, with a Googlebot user agent. We even have a Fetch as Googlebot feature in Google Webmaster Tools. So you fetch a page as Googlebot, and you hash that page as well. And if those numbers are different, then that could be a little bit tricky. That could be something where you might be in a high risk area. Now pages can be dynamic. You might have things like timestamps, the ads might change, so it's not a hard and fast rule. Another simple heuristic to keep in mind is if you were to look through the code of your web server, would you find something that deliberately checks for a user agent of Googlebot specifically or Googlebot's IP address specifically? Because if you're doing something very different, or special, or unusual for Googlebot-- either its user agent or its IP address-- that's the potential to maybe be showing different content to Googlebot than to users. And that's the stuff that's high risk. So keep those kinds of things in mind. Now one question we get from a lot of people who are white hat, and don't want to be involved in cloaking in any way, and want to make sure that they steer clear of high risk areas, are what about geolocation and mobile user agents-- so phones and that sort of thing. And the good news-- the executive sort of summary-- is that you don't really need to worry about that. But let's talk through exactly why geolocation and handling mobile phones is not cloaking. OK. So until now, we've had one user. Now let's go ahead and say this user is coming from France. And let's have a completely different user, and let's say maybe they're coming from the United Kingdom. In an ideal world, if you have your content available on a .fr domain, or .uk domain, or in different languages, because you've gone through the work to translate them, it's really, really helpful if someone coming from a French IP address gets their content in French. They're going to be much happier about that. So what geolocation does is whenever a request comes in to the web server, you look at the IP address and you say, ah, this is a French IP address. I'm going to send them the French language version or send them to .fr version of my domain. If someone comes in and their browser language is English, or their IP address is something from America or Canada, something like that, then you say, aha, English is probably the best message, unless they're coming from the French part of Canada, of course. So what that is doing is you're making the decision based on the IP address. As long as you're not making some specific country that Googlebot belongs to-- Googlandia or something like that-- then you're not doing something special or different for Googlebot. At least currently-- when we're making this video-- Googlebot crawls from the United States. And so you would treat Googlebot just like a visitor from the United States. You'd serve up content in English. And we typically recommend that you treat Googlebot just like a regular desktop browser-- so Internet Explorer 7 or whatever a very common desktop browser is for your particular site. So geolocation-- that is, looking at the IP address and reacting to that-- is totally fine, as long as you're not reacting specifically to the IP address of just Googlebot, just that very narrow range. Instead, you're looking at OK, what's the best user experience overall depending on the IP address? In the same way, if someone now comes in-- and let's say that they're coming in from a mobile phone-- so they're accessing it via an iPhone or an Android phone. And you can figure out OK, that is a completely different user agent. It's got completely different capabilities. It's totally fine to respond to that user agent and give them a more squeezed version of the website or something that fits better on a smaller screen. Again, the difference is if you're treating Googlebot like a desktop user-- so that user agent doesn't have anything special or different that you're doing-- then you should be in perfectly fine shape. So you're looking at the capabilities of the mobile phone, you're returning an appropriately customized page, but you're not trying to do anything deceptive or misleading. You're not treating Googlebot really differently, based on its user agent. And you should be fine there. So the one last thing I want to mention-- and this is a little bit of a power user kind of thing-- is some people are like, OK, I won't make the distinction based on the exact user agent string or the exact IP address range that Googlebot comes from, but maybe I'll say check for cookies. And if somebody doesn't respond to cookies or if they don't treat JavaScript the same way, then I'll carve out and I'll treat that differently. And the litmus test there is are you basically using that as an excuse to try to find a way to treat Googlebot differently or try to find some way to segment Googlebot and make it do a completely different thing? So again the instinct behind cloaking is are you treating users the same way as you're treating Googlebot? We want to score and return roughly the same page that the user is going to see. So we want the end user experience when they click on a Google result to be the same as if they'd just come to the page themselves. So that's why you shouldn't treat Googlebot differently. That's why cloaking is a bad experience, why it violates our quality guidelines. And that's why we do pay attention to it. There's no such thing as white hat cloaking. We really do want to make sure that the page the user sees is the same page that Googlebot saw. OK, so I hope that kind of helps. I hope that explains a little bit about cloaking, some simple rules of thumb. And again, if you get nothing else from this video, basically ask yourself, do I have special code that looks exactly for the user agent Googlebot or the exact IP address of Googlebot and treat it differently somehow? If you treat it just like everybody else-- so you send it based on geolocation, you look at the user agent phones-- that sort of thing is fine. It's just you're looking for Googlebot specifically, and you're doing something different, that's where you start to get into a high risk area. We've got more documentation on our website. So we'll probably have links to that, if you look at the metadata for this video. But I hope that explains a little bit about why we feel the way we do about cloaking, why we take it seriously, and how we look at the overall effect in trying to decide whether something is cloaking. The end user effect is what we're ultimately looking at. And so regardless of what your code is, if something is served up that's radically different to Googlebot than to users, that's something that we're probably going to be concerned about. Hope that helps.