Question

Some of the characteristic features of the data sets I am working with are showing the following trends:-

  1. First 50-70% of the array is almost sorted with the last 30% completely scrambled.

    • Will it be effective if I replace the insertion sort part with shell sort?
  2. First 50-70% of the array is almost sorted with the last 30% containing a lot of turtles.

    • Does the occurrence of turtles matter so much that I should ditch Timsort in favor of this Comb sort variation - here. Their best case performances show O(n) but avg case performance is better for Tim sort with O(n log n), whereas Comb sort has Ω (n log n) but does this take the modified version of Comb sort or density of turtles into account?
  3. Same as 2nd scenario, but partially sorted output is fine if it can improve performance. For example, an array containing 1,000,000 elements can have its smallest 1% (i.e. 10,000 elements) in the first 1% slots of the array but need not be sorted internally.

    • Can this be done by pulling out after a certain recursion depth in Quick sort to place the elements approximately near their deserved place.

If it is relevant, here is the Timsort code for Java that I am trying to modify - code.

Was it helpful?

Solution

I think that the best answer is that it is not possible to reliably predict whether customizing TimSort will produce worthwhile performance improvements for your datasets. You will just need to try it and see.

And I'm going to repeat my advice from my comment: profile it first!

Until you have profiled the application running on representative data, you cannot know if there is even a chance that this will help. For instance, if the computation is only spending 5% of its time sorting data, then a 50% speedup of the sort algorithm will only result in a 2.5% speedup of the application. And that is simply not worth wasting your time on.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top