Tip:
Highlight text to annotate it
X
MATT CUTTS: Hi, everybody.
It's Matt Cutts.
And we're back to talk a little bit
about cloaking today.
A lot of people have questions about cloaking.
What exactly is it?
How does Google define it?
Why is it high risk behavior?
All those sorts of things.
And there's a lot of HTML documentation.
We've done a lot of blog posts.
But I wanted to sort of do the definitive cloaking video, and
answer some of those questions, and give people a
few rules of thumb to make sure that you're not
in a high risk area.
So first off, what is cloaking?
Cloaking is essentially showing different content to
users than to Googlebot.
So imagine that you have a web server right here.
And a user comes and asks for a page.
So here's your user.
You give him some sort of page.
Everybody's happy.
And now, let's have Googlebot come and ask
for a page as well.
And you give Googlebot a page.
Now in the vast majority of situations, the same content
goes to Googlebot and to users.
Everybody's happy.
Cloaking is when you show different content to users
than to Googlebot.
And it's definitely high risk.
That's a violation of our quality guidelines.
If you do a search for quality guidelines on Google, you'll
find a list of all the stuff--
a lot of auxiliary documentation about how to
find out whether you're in a high risk area.
But let's just talk through this a little bit.
Why do we consider cloaking bad, or why does Google not
like cloaking?
Well, the answer is sort of in the ancient days of search
engines, when you'd see a lot of people do really deceptive
or misleading things with cloaking.
So for example, when Googlebot came, the web server that was
cloaking might return a page all about cartoons--
Disney cartoons, whatever.
But when a user came and visited the page, the web
server might return something like ***.
And so if you do a search for Disney cartoons on Google,
you'd get a page that looked like it would be about
cartoons, you'd click on it, and then you'd get ***.
That's a hugely bad experience.
People complain about it.
It's an awful experience for users.
So we say that all types of cloaking are against our
quality guidelines.
So there's no such thing as white hat cloaking.
Certainly, when somebody's doing something especially
deceptive or misleading, that's when we care the most.
That's when the web spam team really gets involved.
But any type of cloaking is against our guidelines.
OK.
So what are some rules of thumb to sort of save you the
trouble or help you stay out of a high risk area?
One way to think about cloaking is, almost take the
page, like you Wget it or you cURL it.
You somehow fetch it, and you take a hash of that page.
So take all the different content and boil it down to
one number.
And then you pretend to be Googlebot, with a Googlebot
user agent.
We even have a Fetch as Googlebot feature in Google
Webmaster Tools.
So you fetch a page as Googlebot, and you hash that
page as well.
And if those numbers are different, then that could be
a little bit tricky.
That could be something where you might be
in a high risk area.
Now pages can be dynamic.
You might have things like timestamps, the ads might
change, so it's not a hard and fast rule.
Another simple heuristic to keep in mind is if you were to
look through the code of your web server, would you find
something that deliberately checks for a user agent of
Googlebot specifically or Googlebot's IP address
specifically?
Because if you're doing something very different, or
special, or unusual for Googlebot--
either its user agent or its IP address--
that's the potential to maybe be showing different content
to Googlebot than to users.
And that's the stuff that's high risk.
So keep those kinds of things in mind.
Now one question we get from a lot of people who are white
hat, and don't want to be involved in cloaking in any
way, and want to make sure that they steer clear of high
risk areas, are what about geolocation and mobile user
agents-- so phones and that sort of thing.
And the good news-- the executive sort of summary-- is
that you don't really need to worry about that.
But let's talk through exactly why geolocation and handling
mobile phones is not cloaking.
OK.
So until now, we've had one user.
Now let's go ahead and say this user
is coming from France.
And let's have a completely different user, and let's say
maybe they're coming from the United Kingdom.
In an ideal world, if you have your content available on a
.fr domain, or .uk domain, or in different languages,
because you've gone through the work to translate them,
it's really, really helpful if someone coming from a French
IP address gets their content in French.
They're going to be much happier about that.
So what geolocation does is whenever a request comes in to
the web server, you look at the IP address and you say,
ah, this is a French IP address.
I'm going to send them the French language version or
send them to .fr version of my domain.
If someone comes in and their browser language is English,
or their IP address is something from America or
Canada, something like that, then you say, aha, English is
probably the best message, unless they're coming from the
French part of Canada, of course.
So what that is doing is you're making the decision
based on the IP address.
As long as you're not making some specific country that
Googlebot belongs to--
Googlandia or something like that--
then you're not doing something special or different
for Googlebot.
At least currently-- when we're making this video--
Googlebot crawls from the United States.
And so you would treat Googlebot just like a visitor
from the United States.
You'd serve up content in English.
And we typically recommend that you treat Googlebot just
like a regular desktop browser-- so Internet Explorer
7 or whatever a very common desktop browser is for your
particular site.
So geolocation--
that is, looking at the IP address and reacting to that--
is totally fine, as long as you're not reacting
specifically to the IP address of just Googlebot, just that
very narrow range.
Instead, you're looking at OK, what's the best user
experience overall depending on the IP address?
In the same way, if someone now comes in--
and let's say that they're coming in
from a mobile phone--
so they're accessing it via an iPhone or an Android phone.
And you can figure out OK, that is a completely different
user agent.
It's got completely different capabilities.
It's totally fine to respond to that user agent and give
them a more squeezed version of the website or something
that fits better on a smaller screen.
Again, the difference is if you're treating Googlebot like
a desktop user-- so that user agent doesn't have anything
special or different that you're doing--
then you should be in perfectly fine shape.
So you're looking at the capabilities of the mobile
phone, you're returning an appropriately customized page,
but you're not trying to do anything deceptive or
misleading.
You're not treating Googlebot really differently, based on
its user agent.
And you should be fine there.
So the one last thing I want to mention-- and this is a
little bit of a power user kind of thing-- is some people
are like, OK, I won't make the distinction based on the exact
user agent string or the exact IP address range that
Googlebot comes from, but maybe I'll
say check for cookies.
And if somebody doesn't respond to cookies or if they
don't treat JavaScript the same way, then I'll carve out
and I'll treat that differently.
And the litmus test there is are you basically using that
as an excuse to try to find a way to treat Googlebot
differently or try to find some way to segment Googlebot
and make it do a completely different thing?
So again the instinct behind cloaking is are you treating
users the same way as you're treating Googlebot?
We want to score and return roughly the same page that the
user is going to see.
So we want the end user experience when they click on
a Google result to be the same as if they'd just come to the
page themselves.
So that's why you shouldn't treat Googlebot differently.
That's why cloaking is a bad experience, why it violates
our quality guidelines.
And that's why we do pay attention to it.
There's no such thing as white hat cloaking.
We really do want to make sure that the page the user sees is
the same page that Googlebot saw.
OK, so I hope that kind of helps.
I hope that explains a little bit about cloaking, some
simple rules of thumb.
And again, if you get nothing else from this video,
basically ask yourself, do I have special code that looks
exactly for the user agent Googlebot or the exact IP
address of Googlebot and treat it differently somehow?
If you treat it just like everybody else-- so you send
it based on geolocation, you look at
the user agent phones--
that sort of thing is fine.
It's just you're looking for Googlebot specifically, and
you're doing something different, that's where you
start to get into a high risk area.
We've got more documentation on our website.
So we'll probably have links to that, if you look at the
metadata for this video.
But I hope that explains a little bit about why we feel
the way we do about cloaking, why we take it seriously, and
how we look at the overall effect in trying to decide
whether something is cloaking.
The end user effect is what we're ultimately looking at.
And so regardless of what your code is, if something is
served up that's radically different to Googlebot than to
users, that's something that we're probably going to be
concerned about.
Hope that helps.