How worthwhile is this optimization for Heapsort?

https://stackoverflow.com/questions/12502748

02-07-2021
|

Pergunta

The traditional Heapsort algorithm swaps the last element of the heap with the root of the current heap after every heapification, then continues the process again. However, I noticed that it is kind of unnecessary.

After a heapification of a sub-array, while the node contains the highest value (if it's a max-heap), the next 2 elements in the array must follow the root in the sorted array, either in the same order as they are now, or exchanging them if they are reverse-sorted. So instead of just swapping the root with the last element, won't it be better to swap the first 3 elements (including the node and after the if necessary exchange of the 2nd and 3rd elements) with the last 3 elements, so that 2 subsequent heapifications (for the 2nd and 3rd elements ) are dispensed with?

Is there any disadvantage with this method (apart from the if-needed swapping of the 2nd and 3rd elements, which should be trivial)? If not, if it is indeed better, how much performance boost will it give? Here is the pseudo-code:

function heapify(a,i) {
  #assumes node i has two nodes, both heaps. However node i itself might not be a heap, i.e one of its children may be greater than its value.
  #if so, find the greater of its two children, then swp the parent with that value.
  #this might render that child as no longer a heap, so recurse
 }

function create_heap(a) {
   #all elements following index |_n/2_| are leaf nodes, thus heapify() should be applied to all elements within index 1 to |_n/2_|
   }

function heapsort(a) {
  create_heap(a);  #a is now a max-heap
  #root of the heap, that is a[1] is the maximum, so swap it with a[n].
  #The root now contains an element smaller than its children, both of which are themselves heaps.
  #so apply heapify(a,1). Note: heap length is now n-1, as a[n] is the largest element already found
  #now again the array is a heap. The highest value is in a[1]. Swap it with a[n-1].
  #continue
  }

Suppose the array is [4,1,3,2,16,9,10,14,8,7]. After running a heapify, it will become [16,14,10,8,7,9,3,2,4]. Now the heapsort's first iteration will swap 16 and 4, leading to [4,14,10,8,7,9,3,2,16]. Since this has now rendered the root of the new heap [4,14,10,8,7,9,3,2] as, umm, un-heaped, (14 and 10 both being greater than 4), run another heapify to produce [14,8,10,4,7,9,3,2]. Now 14 being the root, swap it with 2 to yield [2,8,10,4,7,9,3,14], thus making the array currently [2,8,10,4,7,9,3,14,16]. Again we find that 2 is un-heaped, so again doing a heapify makes the heap as [10,8,9,4,7,2,3]. Then 10 is swapped with 3, making the array as [3,8,9,4,7,2,3,10,14,16]. My point is that instead of doing the 2nd and 3rd heapifications to store 10 and 14 before 16, we can tell from the first heapification that because 10 and 14 follow 16, they are the 2nd and 3rd largest elements (or vice-versa). So after a comparison between them (in case they are already sorted, 14 comes before 10), I swap all the there (16,14,10) with (3,2,4), making the array [3,2,4,8,7,9,16,14,10]. This reduces us to a similar condition as the one after the further two heapifications - [3,8,9,4,7,2,3,10,14,16] originally, as compared to [3,2,4,8,7,9,16,14,10] now. Both will now need further heapification, but the 2nd method has let us arrive at this juncture directly by just a comparison between two elements (14 and 10).

Solução

The second largest element of the heap is present in the second or third position, but the third largest can be present further down, at depth 2. (See the figure in http://en.wikipedia.org/wiki/Heap_(data_structure) ). Furthermore, after swapping the first three elements with the last three, the heapify method would first heapify the first subtree of the root, followed by the second subtree of the root, followed by the whole tree. Thus the total cost of this operation is close to three times the cost of swapping the top element with the last and calling heapify. So you won't gain anything by doing this.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow