Search Engines: Pageranks and the Incredible Lightness Of Being

Yahoo has just released its Top searches of 2006 list, leaving us with the impression that people are using it only to find information on Brittney, Shakira or Paris Hilton. Nevertheless George W., N.Y. Yankees, Spider-Man and American Idol are scoring top as well.

Recently the American Mathematical Society has featured an article with an in-depth explanation of the type of mathematical operations behind Google’s search engine pagerank. The story with the title How Google Finds Your Needle in the Web’s Haystack points out because roughly 95% of the text in the 25 billion web pages indexed by Google is composed from a mere 10,000 words determining relevance requires extremely sophisticated sets of methods.

…Brin and Page introduced Google in 1998, a time when the pace at which the web was growing began to outstrip the ability of current search engines to yield usable results. At that time, most search engines had been developed by businesses who were not interested in publishing the details of how their products worked. In developing Google, Brin and Page wanted to “push more development and understanding into the academic realm.” That is, they hoped, first of all, to improve the design of search engines by moving it into a more open, academic environment. In addition, they felt that the usage statistics for their search engine would provide an interesting data set for research. It appears that the federal government, which recently tried to gain some of Google’s statistics, feels the same way.

There are other algorithms that use the hyperlink structure of the web to rank the importance of web pages. One notable example is the HITS algorithm, produced by Jon Kleinberg, which forms the basis of the Teoma search engine. In fact, it is interesting to compare the results of searches sent to different search engines as a way to understand why some complain of a Googleopoly…

Encourage by the article we had a look how our so far most popular story (Top 10 time waster games) is ranked with Google.

It turned out that Duvet-Dayz.com has a pagerank of 3 (out of more then 1 million pages) plus we are also featured by the links on position one and seven.

Not bad for a web site that is public since a mere 1.5 months. And we do not use any specific SEO tricks or tools - it’s content only.


img Duvet-Dayz.com page ranks





Update 07-December-2006 6.15am
Some people over at slashdot.org commenting on the AMS article suggested the following “slightly” simplified version of the Google algorithm:

SELECT advertiser, description, link, adcost
FROM tblAdvertisers
WHERE adword LIKE %searchstring%
ORDER BY adcost DESC

Leave a Reply

(all comments are moderated)