Duplicate Content

September 29th, 2008 by Carl | Filed under Basic SEO, Duplicate content, SEO tips, URLs.

Introduction

SEO specialists often quote the mantra, “Content is King” and there is nothing better to a search engine than lots of original content. However, writing original content is time consuming and therefore costly. Some less scrupulous web masters copy or plagiarise content from other websites and will publish it as their own. Search engines have ways and means of telling whether the content is original and will rank accordingly.

Even if you are not stealing information, there are many other ways in which your website can fall foul of the duplicate content but there is no reason to be paranoid about it. You can do something about it if you know how it can arise.

How Duplicate Content Arises

There are many ways in which duplicate content can arise and it doesn’t necessarily mean that you have done anything wrong. Duplication can occur when pages on your web site have a substantial amount of the same content on each page. For example, if your pages are generated by a database, the pages will be created by putting together sections for the navigation, header and footer and so on. If there is little else on the page then then search engines may consider it to be too similar to other pages to give it any weight.

Another way in which content duplication can arise is through using the manufacturer’s product descriptions or copy from promotional literature.Unless you manufacture a unique product and sell it direct, the chances are that the things you sell, review or otherwise give information about are probably available on competitor websites and they have the same information as you do.

Where possible, it is better to write your own content to give your site the edge. This is especially true of e-commerce sites with many products descriptions are taken straight from the manufacturers information. Even if you have to include specifications, some explanatory text or a review will help differentiate it from other sites and give your visitors more information.

Same Content Different URL

With the advent of e-commerce systems and blogging platforms, the scope for duplicate content arises from the many ways these systems can access information from a database, allowing visitors to see articles or pages sorted by date written, keywords or subject areas and so on.These functions are usually controlled via parameters contained within the URL.

Therefore, you will probably have cases where different URLs can end up at the same page and you get duplicate content. Solutions will depend on the system but will involve specifying that links that create duplicate content should not be indexed either through robots.txt or via scripting language such as PHP.

Duplicate Content in WordPress

For WordPress, duplicate content created can be reduced quite easily.The following code added to the head.php file will sort it out:

if((is_home() && ($paged < 2 )) || is_single() || is_page() || is_category()){
echo ‘<meta name=”robots” content=”index,follow” />’;
} else {
echo ‘<meta name=”robots” content=”noindex,follow” />’;}

Only the home page, single post pages, category pages, are indexed and followed, otherwise don’t index the content but follow the links to distribute PageRank. If you are using the All in One SEO Pack it will automatically add noindex, follow to pages that create duplicate content.

Additional measures will also help such as:

  • Restricting search engine crawlers from indexing your feeds and trackbacks by inserting a robots.txt file
User-agent: *
Disallow: /comments/feed
Disallow: /date/
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.inc$
Disallow: /*?
Disallow: /wp-admin/
Disallow: /wp-login.php?action=logout
Sitemap: /sitemap.xml

(Duplicate content also arises also from urls that use page, date, category, tag. These could be stopped by the robots.txt but if using the AllInOne SEO Pack, these have the nofollow attribute added to the head section of the page.)

  • Using the <!–more–> tag to show excerpts in your home page instead of full posts
  • Restricting the number of posts displayed in your home page. (It will also stop the annoying situation that occurs on some blogs where all the posts ever written are loaded on one page and it is so long that it crashes the browser.)

Duplication of Title and Meta Information

This is a serious problem and should be avoided at all costs. The title should reflect the contents of the page and is so important that it warrants its own article, Basic SEO: writing title and meta information.

Canonical Form URL

It is crazy, but search engines treat the www.domain.com and domain.com as completely different sites and so there is the potential for a  whole website to become duplicated, especially if you are linked to from other websites using the non-www form.

The www. form of the URL is known as the ‘canonical form’ because it is the proper way to write the URL. It actually makes no difference in SEO terms whether you choose to non-www or www they are both okay but if you have more inbound links from one form than another it is usually better to go with the form with the most inbound links.

Redirection is  quite simple. In the website root .htaccess file (if you are using Linux and Apache) add the following code:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.yoursite\.com$ [NC]
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

On windows based servers, the problem is solved by redirecting a duplicate web site from the IIS control panel.

External Duplicate Content

Having delt with the ways in which web site can generate duplicate content internally, the more obvious way content can become duplicated occurs when someone copies your content and passes it off as their own. It is extremely easy to write a PHP script that goes to a web site and takes the content, (it is known as scrapping) from a page. It should not really be done without the consent of the web master however, the web has few controls on this so it happens.

You can find out whether there are copies of your content by using a program called Copyscape. This will search the web and report back any instances of the same content. Inclusion of copyright notices in the footer of your pages will tell people that the content is not to be reproduced . In the event of someone copying your content, you may be able to report websites that copy content to their ISPs or Google.

Extra Products or Services That May Help
Do you need any Pneumatic Controls
window film available here
OGC Buying Solutions here
Bookmark and Share

Tags: ,

3 Responses to “Duplicate Content”

  1. NoIndex, NoFollow and Robots.txt | SEO The Game | 14/11/08

    [...] to sculpt PageRank,  so that it accrues only to your most important pages or avoiding excessive duplicate content. The interview transcript reveals lots of useful information but it is helpful to summarise that [...]

  2. Absolute Links for SEO | 2/12/08

    [...] and not http://www.seothegame.com/index.html or /index.html as this creates a duplicate content [...]

  3. Hinting at Internal Duplicate Content in Web Sites | 13/02/09

    [...] and Google have agreed to adopt a new standard to highlight URLs that contain duplicate content. Duplicate content issues often occur when the same pages of content is accessed can be reached by multiple URLs or session [...]

Share Your Thoughts

// //]]>