Saturday, December 25, 2004

Search Engine Optimization - how is Google PageRank calculated?

One of the major things that made Google become the currently dominant search engine is their utilization of a proprietary formula called PageRank.

The "page" in "PageRank" actually comes not from "web page", but from "Larry Page", one of the founders of Google. However since it is used to rank the importance of a page, you can think of it either way.

When he was at Stanford University, Larry Page and Sergey Brin developed a complex formula to determine, all other things being equal, which web page of several (or several thousand) is more important and thus should be ranked higher in search results.

If you're interested in search engines and history, a very interesting historical document is this paper about "The Anatomy of a Search Engine" which was written by the Google founders when they were at Stanford University. It explains many of the parts of the algorithm that eventually became the Google search engine. Of course, the algorithm is changing frequently, so you can't take every part of the original paper literally. Still, it gives good insight into things that are important in search engine optimization even today.

The main criteria in the formula behind PageRank is the number of incoming links to your page, how important those linking pages are, and how many links are there on the page they're coming from.

PageRank is the way importance is measured, and is shown on the Google toolbar as a scale from 0 to 10, 10 being the highest. If you read the paper linked to above, you'll notice that all indexed pages on the web will average to a PageRank of 1. Now that doesn't necessarily mean that a PageRank of 1 corresponds to a Toolbar PageRank of 1, I'm not sure if anyone outside Google knows the true correspondence. Perhaps the average web page shows 3 on the toolbar, but it doesn't really matter because everything is relative to every other page on the 'net.

In this discussion we'll use the Toolbar PageRank in numbering. A PageRank of 10 is not twice as good as a 5, because Google uses some sort of logarithmic scale where the difference between 0 and 1 is less than the difference between 1 and 2, 2 and 3, and so on. Some have guessed at the logarithmic value to be 4, some guess 6, some guess 10, again it's probably not terribly significant. Maybe it's 3.764, I don't know, but it possibly changes monthly in order to make sure that the original premise that the average page is ranked PR1 holds true.

The "secret" formula is such that if you have an incoming link from a page with PR5, it is usually more important than a link from a PR4. I say usually because the number of links from the page also is taken into consideration. In other words, if you are the only link from the PR4 page, you get 100% of the transferred value (adjusted for the "damping factor" explained further below), while if you get a link from a PR5 that links to 100 pages, you only get 1% of its value. So let's assume that the logarithmic difference between ranks is 4, that means a PR5 is 4 times more powerful than a PR4, so if you're getting only 1% of the PR5 value transferred to you, or 100% of the PR4's value, I'd prefer the PR4 link.

But hold on. All PR4's are not equal, and all PR5's are not equal. The PR is a range of absolute values. So for example, if the logarithm used is 4, then a chart can be drawn with the following value:

PR0 0-3 "points"
PR1 4- 15 points
PR2 16-63
PR3 64-255
PR4 256-1023
PR5 1024-4095
PR6 4096-16383
PR7 16384-65535
PR8 65536-262143
PR9 262144-1048575
PR10 1048576 or more "points"

where "point" is some absolute value that is calculated as far as link importance.

So, if you look above, a high PR4 is almost as powerful as a low PR5 in terms of link importance being transferred to you. And a high PR5 is much, much stronger than a low PR4 (but if the PR5 has 100 links on it, I'd still prefer a link from a PR4 that has no other links, regardless).

PageRank is transferred from a sending page (outbound link) to a receiving page (inbound link), but according to the paper I linked you to above with the original formula, not all of the PageRank is transferred. There is a "damping factor", in the paper above it is .85 (that could have changed from then to now), which means that only 85% of the PageRank is transferred (divided equally amongst the outbound links). So, if your PR4 page has an absolute value of 500 points, it transfers 85% of its 500 points, which is 425 points. If your page is the only one it is linking to, you get all 425 points. If you get just 3 similar PR4 pages to link to you (and nobody else), your page will become a PR5 just from those links (if the logarithm used is 4). If the PR4 was linking to 10 pages, you would only get 42.5 "points" from it, so it would take you many more incoming links to become a PR4 yourself, let alone a PR5. Lesson: choose your link partners wisely.

It's also important to know that the sending page does not lose its PageRank when it links out, so you shouldn't be greedy about keeping your pages isolated from the world. If you don't link to another page you're just throwing your voting power away, so at the very least you need to link to one or more of the pages on your own site to give them some additional PageRank points.

I highly recommend that you download the Google toolbar and install it on your browser so that you know the PR value of the pages that you are interested in exchanging links with, as well as your own PR. You can get the Google toolbar here.

That's all for today, I urge you to read the article linked at the beginning of this post, get the toolbar, and reread this post once or twice if you found it a bit overwhelming. An understanding of these concepts is absolutely essential to becoming a successful search engine optimizer.

0 Comments:

Post a Comment

<< Home