How do Search Engines Work?
When we type a query into a search engine such as Google, how many of us think about what goes on behind the scenes? Any SEO specialist ought to have a basic understanding of how your query gets transformed into a list of results.
In the early days of the web, a search engine consisted of a list of a few thousand sites. There were so few websites that it was enough just to list pages of all the sites found by a simple crawler. With the explosion in the number of websites after 1994, the task of indexing the web became a lot more difficult.
Even the simplest web search engine has to do three main tasks:
- find information on the web
- store the information in a compressed form
- display information relating to the search query so that the most relevant information is displayed first.
Web Crawler
The crawler, also known as a spider or robot, is the part of the search engine that finds the information and follows links to other pages. For a crawler to find a page it must have a link on another page telling so the crawler can find the information. Without this your page would never be found. As the crawler spiders the web, it has the job of reading the content and sending back the information to a database to be indexed.
Indexing
The information must be coded in such a way that it can be retrieved efficiently. Having to search through the entire internet of content for keywords would be very inefficient. So an inverted table is created which links words to documents. For this happen other processes need to be performed such as the removal punctuation, common words such as: as, of, to, the, etc. The words must be ’stemed’ so that words like action, actions, actioning, actioned are treated as being the same word. Finally, an inverted table is created which links words to documents in which they occur.
Information Retrieval
When a query is made, the search engine looks at the inverted table for each of the keywords and extracts a list of documents in which all the keywords appear. When it has this list it is able to apply a ranking algorithm to choose the order in which to display them.
One of the reasons different search engines produce different results is the way they order the list of selected results from the inverted index. Google is known to use more than 200 different factors to arrive at its ranking information. Obvious factors which indicate relevancy, such as the number of times the key phrase is mentioned; if the keywords occur together or in separate places in the content; whether keywords appear in the title, headings, etc. Other factors are concerned with the determining the quality of the site including PageRank, an algorithm that orders pages according to the number and weight of incoming links, and the Hilltop Algorithm which takes into account whether a page has links from certain authority sites which it considers to be trusted. Still other factors can refine the search results to take into account personal search preference.
The speed of the results is belied by the complexity of the search engine algorithm. A single query is processed by hundreds of separate computers to make the searches manageable and the result is displayed almost instantly. It is a wonderful process.
Interested in Dickies Workwear ?
Boardroom Table available here
Specialist Cleaning Services at great prices.
Tags: Google, search engine operation


![Validate my RSS feed [Valid RSS]](http://www.seothegame.com/wp-content/uploads/2008/11/valid-rss.png)
[...] Yahoo. submit_url = “http://www.seothegame.com/google-is-not-the-only-search-954″; While Google is undoubtedly the dominant search engine and brings in the most traffic for most websites, the [...]
[...] can a search engine determine whether two documents are similar. In our post on search engine operation we looked at how search engines can retrieve information on documents given a query string. Vector [...]