Removing URLs from Google

March 9th, 2010 by Carl | Filed under Analysis.

Normally, as SEO’s we try to concentrate on getting the search engines crawl and index as much of the site as possible. However, occasionally there might be reasons why you would want to remove content from search engines. For example, if spammers have infiltrated your site and put links to their sites.
There are a number of ways in which you can tell the search engine what to do.

.htaccess – password protect the content you really don’t want search engines to index. (you have to do this before they index the content.) This is a strong method of protecting the content inside

Not linking to the site from a page. This could work as there are no links from any other page for the search engines to follow, however it is not completely secure because there is a still chance that someone might visit the page and then visit another page which lists the referring page and thereby creates a link for the search engine to follow.

Robots.txt – a fairly powerful way of telling Google to not crawl pages. In addition to blocking sites there are also commands that can be used on individual pages but they should be used cautiously. However, for Google, even if parts of the site are blocked through the robots.txt file, a reference to them can still appear in the index. This can also be accompanied by a snippet taken from DMoz, so it could look like the page had been crawled.

No index – no index blocks any reference to the page in the search index. However their is inconsistency between search engines when interpreting this. Google will drop any reference to the page but Bing (Yahoo) will still show a reference.

Nofollow – can be used to remove a page from the index but it is a weak method as every path to the page must be no-followed in order to work.

Google Webmaster Tools – if URLs have been indexed, then you can remove them using this tool. It can be found in a tab under crawler access. It can be used to block a whole website, individual directory or just a single page. It is also reversible.

301 Redirects – redirects can be used to remove the content from Google. As the 301 redirect is made, the old URL should be eventually be removed from the search engine index. This is particularly useful as PageRank is also made from the old URL to the new URL.

Bookmark and Share

Tags: , , ,

Share Your Thoughts

// //]]>