Definitions
Let's say that you have an array a[0..n-1]
to sort and you want to do it using k
threads.
For simplicity, let's assume that the smallest element has value 0 and the largest have a value m
. If the smallest is not equal 0, then you can scale the values during assigning elements to threads.
Splitting into threads
Partition your array into k
chunks each consisting of at most floor(m/k) + 1
different values of elements.
The i-th
chunk consists of elements a[j]
such that:
(i - 1) * (floor(m/k) + 1) <= a[j] < i * (floor(m/k) + 1)
For example, if you have an array with 10 elements:
a[0..9] = {1, 2, 5, 0, 3, 7, 2, 3 ,4, 6}
and k = 3
, then m = 7
and the 3 chunks are:
chunk_1: elements in range [0,3) -> [1, 2, 0, 2]
chunk_2: elements in range [3,6) -> [5, 3, 3, 4]
chunk_3: elements in range [6,9) -> [6, 7]
Next, assign each chunk to a separated thread. Each thread sorts one chunk and to get the whole array sorted, just concatenate the results from all threads in order:
thread_1
thread_2
...
thread_k
Complexity:
As you know, the complexity of the count sort is O(n + L)
where n
is the number of elements to sort, and L
is the maximum value of element.
First, notice that you can scale down values in each thread in such a way, that L < floor(m/k) + 1
in that thread, so the complexity of count sort in each thread always depends on the number of elements in that thread.
If you assume that the distribution of the values is uniform, then the expected number of elements in each thread is also floor(m/k)
so the total complexity of each thread is O(m/k)
.