The HITS Algorithm

November 10th, 2008 by Carl | Filed under Algorithms, Search Engine Results, Search Engines, mathematics.

Introduction

If you are serious about SEO, then it is important to know how different search engine algorithms work. In this post we look at the Hypertext-Induced Topic Selection (HITS) invented by Jon Kleinburg, 1998 [1]. It also goes under the name of the CLEVER (Clientside EigenVector Enhanced Retrieval) algorithm as implemented in a prototype IBM search engine. It is thought that search engine Teoma,  latter acquired by Ask, used a method based on the HITs algorithm.

HITS is query dependent and calculated at the time of the search. Pages are given two scores, a hub-score which refers to the number of links pointing from the page an authority-score which depends on the number of pages for the number of pages with links pointing to it.

Hubs and Authority Pages

How HITS Works

The algorithm starts by using a text-based search engine to gather the top r results for a query σ this called the root set R. From this small number of sites, a further set of pages S is gathered if they link to or if they point to a page in R.

The set of pages S expanded from the root set R

For each page p in R that links to a page q, a weight factor a is given for its authority A and a second weighting factor h for its hubness H.  A good hub points to many good authority sites. While a good authority site links to many good hubs. Therefore these two scores are inter-related. Given a set of weights {a} {h} the weights are updated using the following relationships to iterate the weighting factors.
O operation:

apΣq:(q,p)∈Shq

I operation:

hpΣq:(q,p)∈Saq

At each iteration, the authority weights and hub weights are normalised according to.

ΣpSap2 = 1
ΣpShp2 = 1

The set of weights {h} are collected to form an S dimensional vector h with each weight forming a component. Similarly, the set of authority weights {a} also form a vector a
Writing the algorithm formally:

Iterate(S,k)
S: a collection of n linked pages
k: a natural number
Let z denote the vector (1, 1, 1,.. 1) ∈ ℜn
Set a0 := z
Set h0 := z
For k = 1,2,… z
Apply the I operation to (ai−1; hi−1), obtaining new a-weights.
Apply the O operation to (ai, hi−1), obtaining new h-weights hi
Normalise ai, obtaining ai.
Normalise hi, obtaining hi.
End
Return (hk; ak).

The pages with the  top 5-10 highest authority and hub scores are returned.

References

[1] J. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Proc. 9th ACM-SIAM Symposium on
Discrete Algorithms, 1998, extended version in Journal of the ACM 46(1999), also appears as IBM Research
Report RJ 10076, May 1997.

Extra Products or Services That May Help
Information on bailiffs
Are you looking for Baby Gift Sets at great prices then come to us.
Beautiful Wedding Rings of the highest quality.
If you are looking for an Wedding coordinator then come to us.
Wedding Cars West Sussex here
Bookmark and Share

Share Your Thoughts

// //]]>