Tip:
Highlight text to annotate it
X
Hey everybody.
This is Matt Cutts.
And today I wanted to talk about a website
that you might have visited or you might not.
It's called How Search Works.
And if you haven't seen it, I highly
encourage that you check it out.
I think if you search for How Search Works
on your favorite search engine, you'll
probably be able to find it.
But it's just a small site that talks about advences
we've had in crawling, our algorithms, how we fight spam.
And we've even made public a lot of our removal policies.
Some really concrete, nitty gritty stuff.
Even if you have seen the site before,
I wanted to sort of walk you through some of the things
that you might not have noticed and which
are actually quite nice.
So if you go to the main page of How Search Works and you sort
of scroll through a little bit, you'll
notice that it's almost like an infographic.
But it's actually interactive.
So you can click around and find all kinds of fun Easter eggs.
One of the things that I really enjoy
is if you get down to the very bottom of the page,
it'll tell you, you've been on this page for 150 seconds,
or something like that.
In that time, Google has actually handled,
you know, 5.7 million searches, or something along those lines.
It's kind of interesting.
I think we've said before that Google handles
over two billion searches a day or something along those lines.
But it's pretty neat to just figure out the amount of math.
And you could extrapolate and say, OK,
how many searches a day does Google have.
Now that's, I think, a static counter based
on when the site launched.
But it's kind of a fun little interesting thing to play with.
As you look through the site, you'll
also find that we talk a lot about how we do evaluation.
So we've got some videos about how we evaluate search quality.
Just to remind people of a misconception a lot of people
have, we do evaluate new algorithms
and then send them out to what we call quality raters.
And they look at whether they get
one set of search results on the left
and one set of search results on the right.
And they have to decide which one looks better to them.
And they don't know what algorithm's being evaluated.
And whenever they vote, we take that data and we say, OK,
which search results got better.
And which ones got worse?
But we don't take those votes, those ratings from the quality
raters, and directly apply them in our ranking algorithm.
Now what's kind of fun is, we actually
show the funnel for the things in a recent year-- I think
it was 2012-- where we went through 118,000 ideas, where we
just played around with a new way of doing generating search
results.
And, using the ratings that we'd already
gotten from quality raters, we were
able to say, oh, in general, this
looks like a promising experiment, for example.
From there, we did 10,000 what we call side-by-sides, where
again, you get these side-by-side sets of search
results.
And it's like a blind taste test.
And you ask people, which one do you like better.
Based on that, we did 7,000 of what we call live traffic
experiments, where we actually take an experiment
and we put it out on our main website.
And we look at how often people click on various search results
to try to determine whether we're actually
making the search results better.
And so the net result was that we
were able to launch 665 algorithmic changes, things
that changed on our search results
page in 2012, which is kind of interesting.
To put that into context, that's roughly two changes
to how we generate the search results page every single day,
for the entire year.
So it's kind of funny when people come and ask, well OK,
what happened on such and such date.
Because there's usually a lot of stuff happening,
things rolling out, new data being deployed.
And those are actual changes, not just
data being refreshed, that we're talking about.
So that gives you a little bit of a feel
for the scale of how many different changes we're
exploring at any given point.
Now the part of How Search Works that I enjoy the most
is the spam section.
And there's a lot of nitty gritty detail there.
We went into all kinds of information
that you might not have seen before.
So for example, there's a spam carousel.
And that is updated periodically.
So you actually get to see spam right after we've removed it.
So we'll show you a screen shot so that you don't
run into danger of getting infected
by malware or something.
But it's literally like you can watch over our shoulder
as we're removing spam.
And so you get a chance to see the sorts of stuff
that we have to deal with every single day.
Right below the spam carousel, you'll
see that we have different types of spam.
So we talk about the categories of spam.
I think that's pretty helpful to know because that lets
you know the sorts of stuff that we have to deal with.
So the major categories are cloaking or sneaky redirects,
hacked sites, hidden text or keyword stuffing,
parked domains, pure spam, which is just
another name for black hat, when it's like, any savvy user would
hopefully be able to recognize it as absolute spam.
Things like spamming free hosts, or dynamic DNS providers,
thin content with very little added value, unnatural links
from a site, unnatural links to a site-- and then
user-generated spam, where you might have good content up
front, but maybe so many spam comments
that it's actually causing bad search results or a bad user
experience.
So there are more specific, more granular, more detailed things,
within each one of those.
So unnatural links from a site might
involve someone who was selling links
that pass PageRank, for example.
But that gives you an idea of the overall categories
that we look at whenever we're actually fighting spam.
The other thing that's kind of interesting
if you surf down the page and look a little bit,
is we give you several different graphs.
We actually tell you month by month
the actions that we've taken on spam, so what types of actions
and how many actions we took.
And if you look, you'll see that the vast majority of what
we tackle is what we classify as pure spam or black hat spam.
So that just means that it's stuff that, you know,
it's gibberish, it's something that anybody
would be able to recognize if they're sufficiently savvy.
It might be machine-generated, auto-generated sort
of spam, hopefully the sort of thing
that anybody would look at and be like, wow,
I hope I don't see that in my search results.
Something that you might not notice
is the next biggest category within recent years
has been hacked sites.
And it's kind of funny, because back in 2010, there
was some SEO who wrote something like,
what's the web spam team been doing.
I haven't seen a lot of action from them recently.
And we were actually engaged in a pitched battle,
hand-to-hand combat on hacked sites, which,
if you're just a regular SEO, or even a regular black hat SEO,
back then you might not have noticed as much.
So it's not the case that we were taking a break
or taking things easy.
We were working very *** spam.
It was just a type of spam that most people
hadn't encountered yet.
And we're going to keep working on all those kinds of things.
So you can get those kinds of insights
when you look through these graphs
and see, OK, this is the history of the sort of stuff
that Google has had to tackle in terms of spam.
What's also interesting is we've started
to do more and more messaging over time.
Now we could probably do better and think about other ways
to get more concrete, more actionable messages
to webmasters.
And we're going to keep exploring that.
But when you look at the milestones in terms
of what we've done in terms of communication,
it actually is pretty exciting.
And you can see the volume spike up
as we've started to give more and more information.
At this point, for pretty much any direct action
that you take, that the manual webspam team takes that affects
your ranking, the webmaster will get a message about that.
And that's really helpful, because at least you
know that there's an issue.
And you can start to deal and dig into it
and start to investigate a little bit.
So it's kind of interesting.
You know, I'm looking at one graph that says,
in January of 2013, we sent over 431,000 messages
as a result of actions that we took on the webspam team.
And so the other thing that you should think about
is the scale at which we're operating.
Now remember, that's manual webspam actions,
which then generated some sort of message to the webmaster.
The idea that we could have a one-on-one conversation
with 431,000 different owners of websites sort of
shows you the scale that we're operating at and why it's hard,
and why, so far, we haven't figured out
a way to have a one-on-one conversation
with every single webmaster who wants
to rank number one, or rank highly,
or has questions about potential webspam action.
But what you can see below that is a graph
that shows the reconsideration requests that
have been submitted.
And so for a random week in 2013,
there were roughly 5,000 reconsideration requests.
And bear in mind, this is interesting.
So over a month, 430,000 messages go out.
And then in a week, we get 5,000 reconsideration request
messages.
So if you take that week-long baseline
and turn it into a month, call it
about 20,000 reconsideration request processing messages
that we handle during a month.
Now what's interesting about that
is, if you do the math, that basically means,
of all the people we alert of manual webspam action,
right now at least, less than 5% of those people request
reconsideration.
So that actually means that most the time we're killing spam
and the spammers are not saying, hey this is not right.
I want to contest this.
They're actually saying, OK you caught me.
I'm going to move on to try to do it on a different URL
where you won't catch me next time.
So it's kind of neat to take some of these numbers
and compare them out and play a little bit with realizing what
insights can we get from these kinds of graphs.
And it shows you the scale of the problem.
If you have 20,000 people a month
who want to talk to you about why they think
their website should rank highly when we think that it has
at least violated the guidelines,
you see the sort of difficulties we
have in trying to talk to everybody.
We'll keep trying to do better.
We'll keep trying to be more transparent.
But I think it's fantastic that we've
got this How Search Works website.
We've got some dashboard where you
can see how things are going.
And you can even see live examples of spam
as they get thrown out.
So we'll keep looking at ways to make things even better.
But I think you'd really enjoy the website.
If you get a chance to check it out,
dig in and just absorb some of the information that's
available on the website.
Thanks very much.