SEO for PDF Documents

PDF and Search Engine Optimisation

In a previous post, we looked at SEO for Flash but there are other formats which are indexed by search engines. PDF documents are a popular format used by industry and a favourite way of sending information such as product brochures. Many companies have lots of information in the form of PDF, so how well can these be read and what can be done to optimise them so that they can be found more easily?

Search Engines and PDF Readability

The major search engines have algorithms for reading the content in PDF documents.

The data below was found by Tim Nash, who conducted an interesting study on the readability of Flash and PDF documents.

The results for the PDF sites are as follows:

Google Yahoo MSN/Live Ask
Indexed Yes Yes Yes Yes
File Name Yes Yes Yes Yes
Term Rank Yes Yes Yes Yes
Links Indexed Yes Yes Yes/No No
Links Weighted Yes Maybe Maybe N/A
Duplicate HTML Yes Yes Yes Yes
Duplicate PDFs No Yes Yes Yes

The main points of the study were:

  • In all the search engines PDF documents were indexed, however, they did not rank well if there was a competing HTML document.
  • Links are crawled by all the major search engines. This study indicates that links are also indexed and their link-weight can be passed to other pages. However, as this link-weight is small to begin with so it is difficult to tell.
  • Duplicate content scraped from an HTML page and placed in a PDF was indexed without penalty but did not outrank the original HTML. Duplicate PDFs were also indexed in all search engines except Google, where the meta information was identical. If the meta information was altered it would still rank.Similarly, if the file size was changed by adding additional content then the duplicate would also be indexed.

Optimising PDF Documents

PDF documents should be optimised by Optical Character Recognition (OCR) to reduce the size of the document.

As with all filenames, it should be descriptive of the content for users. The idea of keywords in filename as a ranking factor is plausible but more experimental evidence is needed.

It is also important to add the meta information to give addition information to the search engines. Information on how to change the meta data can be found on the Adobe web site.

Do PDFs have a benefit for SEO?

Absolutely. Even though PDF document do not carry much weight, take a long-time to rank between 2-6 weeks depending on how often the robots visit.  It is still worthwhile to include PDF documents on the site. The additional content and will add to the long-tail searches. They also provide a useful way of giving your visitors information which they can refer to at a later date.

If you already have PDF documents for products then including access to them on your website requires minimal effort. Duplicate content of text on the website and in the PDF should not be a problem as the HTML will always rank above a PDF document for the same content. Because both will rank content can be written in both HTML, which is a convenient form for site visitors, and PDF if they want to download the information to read later.


Extra Products or Services That May Help
who needs Dietary Supplements
Leaving cert grinds
France SEO
Bookmark and Share

Tags: ,

Share Your Thoughts

// //]]>