Calculating PageRank

September 18th, 2008 by Carl | Filed under PageRank, mathematics.

Introducing PageRank

The one thing that made Google the most successful search engine in the world was way in which it chooses the results to show it users. PageRank is the method developed by Larry Page in 1995 and applied to the Google search engine in 1998. The method of sorts results depending on an algorithm with takes into account the number of links pointing toward a particular page. The basic premiss behind it being that a page with a greater number of links from similarly authoritative sites is likely to be a web site which has content related to your search query. In the early days of the web, the fact that someone linked their page to yours could be considered a mark of the quality of your web page. Today, this distinction is not so clear and so it is just one of many factors that go into the ranking of a web page in the Google algorithm.

How PageRank Works

Any web site can be considered to be a network of points or nodes which represent each page with links pointing to an leaving from pages with links. We can write the links going from node to node as a matrix. If their are n nodes or pages in a website, the dimensions of the matrix would be n x n.
In this matrix, where one node xa has a link to another node xb we can change the element at xab to a 1 otherwise it remains a 0.
This is known as the Adjacency Matrix because tells you which pages link together. PageRank considers the each non-zero matrix element in the adjacency matrix xij and assigns it a value 1/(Σj xij).

This matrix is called the transition matrix and represents the probability that a surfer will go to a linking page from a given web page. If there are 3 links on page x then the probability of a person on that page going to another link is 1/3.

What about the pages that have no links coming from them? There is 0 probability that they will be followed because there are no links to them. However we can do better than this. We can use suppose that sometime the surfer will decide not to follow from the current page but type in another URL from the site. The probability with which they do this is controlled by an adjustable parameter, c usually set to 0.85, meaning the 85% of the time they will follow a link on the page but there is a 15% chance that our surfer will teleport to another page within the site.
So instead of zero probability, a page without links has a 1/n chance of being seen by our surfer. The equation for the transition matrix G becomes:

G=(1/n)(1-c)M+cA

Where M is a n x n matrix with all the elements containing 1. So How do we find the PageRank from this?
This is basically a system of linear equations which we can solve analytically for very small matrices. This results in n eigenvalues which form components of the n eigenvector, π.
There are many ways of solving this equation directly using analytical methods. However, the real Google PageRank transition matrix would be of the order 20 billion pages or some 400 billion elements in G. The power method is a good way to compute the PageRank.
The power method can use random values for the components of the eigen vector π0. Plugging these values into the equation:
π i+1 =i.
gives a new set of component for π. Repeated application, using the new results for the next cycle causes the eigenvalues to converge. The iteration process continue until the desired accuracy is reached.

Other Methods for Approximating PageRank

Other methods for calculating the PageRank include, the Monte-Carlo method . A  journey is made through the site following links until a page is reached where there are no more links to follow. This process is repeated many times and the  the frequency with which a particular page is visited is related to its importance in the web site. Monte-Carlo techniques can give a fast and accurate approximation to the PageRank for pages that are important in the web site.
Another interesting paper finds analogies between the PageRank algorithm and solving the Schrodinger equation, used so often in quantum mechanics.
My own interest lies in calculating the local PageRank and relative PageRank as well as ways of improving the PageRank of individual pages by modifying the internal structure of the site for SEO purposes.

Extra Products or Services That May Help
Adjustable Beds at great prices.
If your looking for adjustable bed bedding then look no further.
Beds at great prices.
childrens beds here
Bookmark and Share

Tags: , , , ,

4 Responses to “Calculating PageRank”

  1. How do Search Engines Work? | 25/11/08

    [...] headings, etc. Other factors are concerned with the determining the quality of the site including PageRank, an algorithm that orders pages according to the number and weight of incoming links,  and the [...]

  2. Toolbar PageRank Update, 12/2008 | 2/01/09

    [...] On 31st December, Matt Cutts confirmed the latest update to toolbar  PageRank. SEO the Game has gone up to a PR-4 from PR-0, however, this is only for the first [...]

  3. Optimizing, Optimising Ecommerce Web Sites Strategies | 5/03/09

    [...] the more important product category pages. PageRank sculpting as is useful for distributing any PageRank that a page has. If your pages have a low PR then this is not going to do [...]

  4. TunkRank,How Influential are you on Twitter? | 18/03/09

    [...] measures your influence in the Twitter world. For those that are familiar with PageRank it uses a similar concept of traversing the directed graph the graph of  follower and those [...]

Share Your Thoughts

// //]]>