Tip:
Highlight text to annotate it
X
>> CUTTS: Here is a fun question from Sebastian in Germany. Sebastian asks, “We still have
old content in the index. We block them via robots.txt use 404 and delete them via Webmaster
Tools, but Google still keeps it. What can we do to quickly delete content from the index?”
This is a great question. It looks like you’re doing all the right things, so I’d be interested
to find out more details. But let me tell you what most people do because there is often,
you know, some sort of accident in place. If you want to remove a single page, you need
to make sure that your web server returns a true 404 code for that single page. So,
for example, if you say “file not found, page not found,” but the HTTP status code
that you return is a 200 and not a 404, then we'll say, oh, okay, this page is still alive
because it’s a 200 code so we won’t process that URL removal request. Instead, we’ll
say, “No, this page is still live.” It needs to be truly gone and truly returning
a 404 before we’ll delete it. So that’s deleting a single page. Now, let’s talk
about deleting an entire site. Because we might not be able to check every single page
on the site, we require that if you want to remove the entire site, it needs to be blocked
in robots.txt. If you do those things, removing a site, block in robots.txt; removing a page,
make sure that it truly does return a 404 status code, then everything should go smoothly
in the URL Removal Tool. If it doesn’t, stop by our Webmaster Help Forum and ask,
“Hey, what’s going on?” It’s at google.com/webmasters. You can find the link there. And if you’re
returning the right status code and you’ve got to block it in robots.txt, that’s something
we want to know if we’re not removing your content quickly, because if you don’t want
your content in Google’s index then we don’t want to return it if you don’t want it returned.
So those are some simple mistakes. Most people don’t return the true 404 code and most
people don’t know that if you want to block an entire site, we say it's somewhere in the
documentation. I’m sure it has to be blocked in robots.txt, so that we’re not just checking
individual pages. Those handle 90% of the cases where people say, “I tried to remove
things and it didn’t really disappear." So checkout those two factors.