Parallel Sorting — The Need for Speed

6 min readNov 11, 2023

This chapter, It’s mainly talks about how to implement a sorting algorithm using hardware components. I know that if you are a software engineer such a topic may not bring your interest but I will try to make you interested first you must consider it as a life example for doing a merge sort with more enhanced time complexity maybe this title will attract you more😂.

Let’s start with defining comparators, a comparator is an electric circuit that compares two signals and gives an indication of the extent of their dissimilarity. From the definition, we can see that we can use comparators to sort some numbers that entered the circuit but it’s limited because the comparator is similar to most electronic components it has only two inputs. Therefore one comparator can only sort two numbers.

For example, we can suppose we have entered two inputs a = 7 and b = 4 it will processed as follows:

Well, to simplify the draw we will be considering the right image to represent one comparator in the rest of the article.

The circuit works as follows let's say that the first input is x[1] and the second input is x[2]. x[1] will receive the smaller key by comparing two inputs min{a,b}, and x[2] will receive the larger key by compare inputs max{a,b}.

This single comparator can be used to help implement conditional exchange operations in merge or quick sort for example but it will executed sequentially one after another and this is not optimal of course and takes big time.

but if we used many comparators it could help us to do many comparisons at the same time and it will be faster than sequential algorithms because we can do two comparisons for example at the same time.

The below image shows 6 comparators that can do a max. 2 comparisons at the same time and sort 4 inputs. Let's consider the input is of sequence (4,3,2,1).

If any input is under process in other words compared with other inputs it will not be available to do any other process on them they are reserved by the comparator that works on them. If you think in a logical way we can’t compare 4 and 3 when 4 already had a comparison with one. In simple terms, you can’t cook and go out with your friends at the same time 😅.

Such a circuit is called a bitonic sorting circuit what is the meaning of bitonic well bitonic is a sequence in which its length can only be equal to 2,4,8,16,…,2^n, and it is increased in order and at a certain point starts to decrease or vice-versa.

Examples of bitonic sequences: 0,1,3,5,7,5,3,1 (it has a length of 8 it increases then decreases). The point at which the sequence changes from inc. to decrease or vice versa is called a bitonic point (5,7).

We will discuss some laws now so that you can determine how many parallel steps I need to sort n inputs t(n), how many comparators I can use in each parallel step, and how many comparators we need s(n). we will start with a small example that will make you understand how these rules come up.

First, How many parallel steps which denoted as t(n)?

t(n) = 1 + 2 + 3 + …. + (k-1) + k = ⅀ i where i starts with 1 and ends with k which is equal to log₂n. In other terms t(n) = 1/2 * k * (k+1) = 1/2 * log₂n * (log₂n + 1).

For example, if you have a circuit of 4 inputs then k = log₂4 = 2 and t(n) = 1+ 2 = 3, so simple right? , we can do 3 parallel steps to sort 4 numbers.

Second, How many comparators I can use in each parallel step?

It will be equal n/2 where n is number of inputs

For example, if we have the same circuit of 4 inputs then the total number of comparators in each step will be 4/2 = 2. which means we can use 2 comparators at the same time.

Finally, How many total comparators do we need in a circuit denoted as s(n)?

well, it’s a thing that can be concluded from the first and second rules by multiplying both of them you can get how many comparators you need.

s(n) = n/2 * t(n) = 1/4 * n * log₂n * (log₂n + 1)

For example, The circuit with 4 inputs will have a total of s(n) = 1/4 * 4 * 2 * (2+1) = 6 comparators that will be used to manage sorting.

The image below I was drawn to demonstrate the simple example of 4 input elements which are (16,5,12,4) as you can see it’s a valid bitonic sequence and it will be sorted in 3 steps, you will have 2 comparators at each step and 6 comparators as total number of comparators in the circuit.

I think now you can understand how these calculations work and how bitonic merge parallel sort works.

From the previous lesson on fast sorting algorithms, The merge sort time complexity was O(n log₂n). Bitonic sort exchanges the n of merge sort with 1/2* (log₂n + 1) which has a great difference because if our number of inputs is equal to 2²⁰ that means that factor in merge sort which is equal to 1,048,576 is decreased to be 11.5 in the parallel sorter.

Conclusion the parallel sorter is faster than the merge sort but this fastness is brought at some price which is the number of comparators we used to sort the numbers if we have 2²⁰ inputs this means we will have 110,100,480 comparators to sort this number of inputs which lead to that the price of the needed hardware increases.

The below image will show you the complex example in the book that I am assure you can now understand easily.

bitonic sort circuit which sort 16 input

Finally, some improvements have been introduced to make the number of parallel steps directly proportional to log₂n instead of (log₂n)² in a bitonic sort circuit and it’s called AKS-sorting circuit but it’s only better than bitonic in case we have n ≥ 2¹²⁴⁰⁰. So it will only show a difference in case you have a really big number of inputs.

Anyway, if you want to know more about AKS and when it will show a difference in the running time the book has suggested two references to look at them which are:

Miklós Ajtai, János Komlós, Endre Szeméredi: Sorting in c · log n parallel steps. Combinatorica, 3:1–19, 1983. This paper describes the currently asymptotically “fastest” sorting circuit, the famous AKS-sorting circuit. Unfortunately, the constant c in the paper’s title is much too large for real-world applications of the circuit.
Mike S. Paterson: Improved sorting networks with O(log n) depth. Algo-
rithmica, 5:75–92, 1990. Here, a considerably simpliﬁed construction of the AKS-sorting circuit is presented. This paper is very well written and fun to read (if you are a theoretical computer scientist). This variant of the AKS circuit “only” needs 6,200 · log2 n parallel steps.

So finally that’s it still tuned for more exciting topics introduced in the Algorithms Unplugged book and simply explained by me😊. The next lesson is Topological sorting so stay focused😉.

Parallel Sorting — The Need for Speed

Written by Zakaria Loai

No responses yet