Crawling AJAX Sites

October 9th, 2009 by Carl | Filed under Google, Search Engine Results, Search Engines.

AJAX (Asynchronous JavaScript And XML)  is a combination of  Javascript and XML to process information without having to refresh the page. It is used increasingly in web applications and may websites and  search engines have great difficulty crawling  websites that use AJAX because they do not execute  Javascript.

With this being a modern and useful feature for visitors on websites, there is a considerable need to be able to crawl this information.

Sites that use AJAX produces URLs that look like, http://www.example.com/#state1 everything after the octothorpe is meaningless to a crawler.

Google have been researching methods to crawl AJAX sites and have developed a way  that uses the server to run its own JavaScript at the time the crawler visits, via a headless browser. (This is a browser that executes and processes the code on the site in the same way as a visitor might including all the javascript content but does not show the output on screen.)  To finally get the information the crawler can take a snapshot of the processed page from which it can extract the URLs required to make the site work  for visitors.

The good thing about this is that the crawler will see the same thing as a visitor and so everyone is happy.

Because the site needs to run its own code for the crawler, this would be an opt-in method to say which states the crawler is allowed to index but unless it is going to tax your server too much, who is going to mind? Especially if it gets more information for your site indexed.

For more information visit the Official Google Webmaster Central Blog.

Extra Products or Services That May Help
Projector Screen at great prices.
Server Support Bristol at great prices.
Bookmark and Share

Share Your Thoughts

// //]]>