Question

Given $n$ arrays of size $k$ each, we want to show that at least $\Omega(nk \log k)$ comparisons are needed to sort all arrays (indepentent of each other).

My proof is a simple modification of the decision tree argument used to obtain the lower bound for comparison-based sorting of one array. More specifically, I argue that there are in total $k!^n$ possible permutations for the entries in all given arrays, and that a binary tree with that number of leaves is of height $h \in \Omega(nk \log k)$. Is that argument correct?

Furthermore, I was told that merely observing that one needs $\Omega(k \log k)$ comparisons for each of the arrays and we need to sort $n$ times in total (for $n$ arrays) is not a sufficient argument. Why is that? My answer would be that this is just one possible approach to this problem, and not a general argument excluding each and every other potential comparison-based algorithm for solving the given task with less than $\Omega(nk \log k)$ comparisons. However, this is not particularly concise and I would consider a rather technical argument (which I don't see) as more appropriate. What would that be?

Was it helpful?

Solution

For the first part, as the comments suggest - observe that there are $k!^n$ possible permutations for the inputs - indeed, every array has $k!$ permutations, and there are $n$ arrays, so you have $k!$ options for the first array, $k!$ for the second, and so on.

So the height of a decision tree is $$\log (k!^n)=n\log (k!)\in \theta(nk\log k)$$ So you need at least that many comparisons in order to determine a single leaf.

For the second part - the argument you suggest is quite close to a formal explanation. The lower bound of $\Omega(k\log k)$ is for comparison-based algorithms that compare elements in the array. When you have several arrays, then it is not justified to assume that sorting them is equivalent to sorting each one independently.

A-priori, it may be easier to sort them by comparing elements between the arrays. But the lower bound on sorting does not utilize such a setting, so it is not directly relevant for this problem.

In my opinion, this is as formal as you can get without formally defining what a comparison-based algorithm actually is. To define the latter, you need to formally model the computational model you use (which would be a sort of pointer-machine), and this is unlikely to result in any insights other than what you already mentioned.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top