The Hilltop Algorithm

December 1st, 2008 by Carl | Filed under Algorithms, Google, Search Engine Results.

Introduction

The PageRank algorithm is a global ranking metric used to determine the overall quality of a page. The Hilltop algorithm determines the quality of a result based on the relevance to the query term. The original paper is available. The algorithm overcomes problems with broad search terms which return large sets of documents that have to be ranked by checking them against expert sites are derived from the terms in the query.

An expert site is be one which has many one-way links to other expert sites. Affiliated sites are discounted. (Sites are deemed affiliated if they share the same IP address information or the same domain.)

It seems likely that the Hilltop algorithm is only run on certain popular keywords with a pre-compiled set of expert pages, such as the DMOZ directory or the Yahoo directory, as it requires a large amount of processing to compile the list of authority sites. From the original paper:

For an expert page to be useful in response to a query, the minimum requirement is that there is at least one URL which contains all the query keywords in the key phrases that qualify it. A fast approximation is to require all query keywords to occur in the document. Furthermore, we assign to each candidate expert a score reflecting the number and importance of the key phrases that contain the query keywords, as well as the degree to which these phrases match the query.

Thus, we compute the score of an expert as as a triple of the form (S0, S1, S2). Let k be the number of terms in the input query, q. The component Si of the score is computed by considering only key phrases that contain precisely k – i of the query terms. E.g., S0 is the score computed from phrases containing all the query terms.

Si = Σ{key phrases p with k – i query terms} LevelScore(p) * FullnessFactor(p, q)

LevelScore(p) is a score assigned to the phrase by virtue of the type of phrase it is. For example, in our implementation we use a LevelScore of 16 for title phrases, 6 for headings and 1 for anchor text. This is based on the assumption that the title text is more useful than the heading text, which is more useful than an anchor text match in determining what the expert page is about.

FullnessFactor(p, q) is a measure of the number of terms in p covered by the terms in q. Let plen be the length of p. Let m be the number of terms in p which are not in q (i.e., surplus terms in the phrase). Then, FullnessFactor(p, q) is computed as follows:

  • If m <= 2, FullnessFactor(p, q) = 1
  • If m > 2, FullnessFactor(p, q) = 1 – (m – 2) / plen

Our goal is to prefer experts that match all of the query keywords over experts that match all but one of the keywords, and so on. Hence we rank experts first by S0. We break ties by S1 and further ties by S2. The score of each expert is converted to a scalar by the weighted summation of the three components: Expert_Score = 232 * S0 + 216 * S1 + S2

What does this means for SEO?

The first point is that you should try to obtain links from web sites which are considered to be authority sites in the field. Look in the DMOZ directory and Yahoo directory for related websites in the same field, or in a connected field. Another method would be to look in the search results for sites which are returned for broad searches. If the Hilltop algorithm is running then these sites will have links from authority sites. Check their links using link in Google, even though this is slightly unreliable, or alternatively look for links in Yahoo Site Explorer). It doesn’t tell you if Google has this link but it may help to find good quality sites.)

Extra Products or Services That May Help
Interested in First Aid Kits ?
Are you looking for Weighted Hula Hoops
Land Surveys and Site investigations
Bookmark and Share

Tags: , ,

One Response to “The Hilltop Algorithm”

  1. Google Sandbox, Grandfathering Links Algorithm | 2/12/08

    [...] links sooner for sites with higher PR, makes a lot of sense. Relevance can be determined from the hilltop algorithm and is PR is another statistic which for Google is a given. Other factors such as IP and page aging [...]

Share Your Thoughts

// //]]>