Question

External sort complexity has been well described here. On page 2 and 3, it well describes the phase1 and phase2. I understand complexities described in each step. My question is on the line Summing up the costs of phase 1 and phase 2 we get the total running time of Θ(n log2 (n/m)) mentioned on page 4. After pseudo code.

From my understanding - complexity of phase 1 is: O(MlogM) to sort all records in the main memory buffer. We fill the buffers N/M times thus effectively O(N/M * M log M) which is O(NlogM).

How does :

Phase 1: O(NlogM).

and

Phase 2: Θ(n log m/n)

combined result in Θ(n log2 (n/m)) ??

Was it helpful?

Solution

When assessing the performance of external algorithms, we typically only count the number of disk operations needed to execute an algorithm. The paper defines n = N/B and m = M/B.

Phase 1 actually only needs Θ(n) disk operations, since we only read and write each block once. It is now obvious that Θ(n) + Θ(n log n/m) = Θ(n log n/m)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top