Page Rank

What Is Relevant in the World-Wide Web?

Zakaria Loai
5 min readDec 11, 2023

This is the last lesson in part 1 of the Algorithms Unplugged book which talks about an algorithm that makes Google stand out as a search engine.

The algorithm is called page rank and it helps in showing results that are relevant to your search query.

Imagine our life without search engines if you want to search for a thing on the internet you will have to spend your whole life searching for it and you may be at the end not find what you searching for, so search engines help you to narrow the search to be more relevant to what you want.

In search engines, we can’t depend only on what a user enters in the search box to narrow his search because we will find millions of things on the web containing what he writes or related to it and if a search engine shows the results randomly you will end up wasting time not as much as searching the whole web of course but still you have to spend many hours or maybe days to find out what you are searching for.

So from all of the above, it’s really important to order the results of the search and not randomly show them.

The page rank algorithm helps in doing this by giving each page on the web a certain rank and sorting them by this rank so if you search for Medium as an example you will find the first result in your web Medium website then maybe a Wikipedia article about Medium and as you scroll down you will get the pages which are not as high rank as Medium.

I think you are now asking how it can be calculated how do they know that this page has more visitors than that one and therefore it appears first?

Well, The answer is by tracking two things the number of pages which can lead to this page and the preference of users when they see that page they click on it and often choose to complete surfing on it or they just choose another choice or return.

For our example Medium we will see that the two conditions are valid first of course other websites have links that redirect you to Medium and also many users press on Medium links from other sites and by this we guarantee that this website is relevant to many users and when anyone search for word medium you will have the medium website as the first result.

As an example, we will have a small graph representing our web pages and see how it works.

In the above graph, the page that will be the most relevant is page A because it can be reached from all other pages in this graph. But as we said before it does not depend only on that to have a high-rank page it also depends on when users for example on node D. What is the most choice from this page? Is it to go to A, B or C this also plays a vital role in detecting the page rank.

PageRank Algorithm (Algorithms Unplugged page no. 95)

This is almost what happens in the page rank algorithm at first we consider that each page has the same relevance score which is 1.

Then while the relevance scores are still changing from one step to another we will calculate the new relevance score for each page.

Let’s analyse the equation first we have 4/5 which represents 80% of what will happen that you will come from another page that links to this page.

We will multiply it by the summation of all the pages that can link to the page we calculate its relevance. Each page is equal to the relevance of Q / number of links of Q.

The Relevance Of Q is set to 1 initially at each step this number and number will be changed.

The number of links of Q is the number of links in the page (e.g. in our graph D page has 3 links to A, B and C).

1/5​ This fraction represents the 20% probability that, at any given step, a random jump can happen from one page to another that is not linking together.

1/no. of pages This part ensures that the random jump is equally likely to land on any of the pages in the system.

For example, if we have this graph:

Graph from Algorithms unplugged book page no. 92

The equations of each page will be the following:

Equations for the graph from Algorithms unplugged book page no. 92

We notice that F is a dead-end reach to it you will have nothing to do next so as a solution we have added the 20% (1/5) probability of jumping from a page to another with no link between them.

Also, E has no page linked to it so unless you start from it you can’t reach this page and the added probability of jumping makes it available to be accessed.

These two points declare why we consider adding these ratios to our equations.

Looping through these different equations we will have the following table:

The table represents the relevance score of each page at each step (Algorithms unplugged book page no. 95)

The scores are improving with every step, and when they are not changing
by much anymore, this is an indication that we are close to the exact solution.

That was how page rank works and how it helps you to get what you search for on the world wide web. Next, we will start on the second part of the book where it focuses more on Arithemtic and Encryption Algorithms. So stay tuned.

--

--