Вопрос

I am implementing bucket sort in Java and I find that it sorts (ascending) faster when the input array is sorted (either ascending or descending) rather than random. Why is this? As I understand it, it just goes through the array and increments the "tally" array at the index of each element. I don't see why it would run faster with sorted input, but it seems to be about twice as fast.

Thanks

Это было полезно?

Решение

The reason is very likely due to spatial locality and higher cache hits with a sorted input set.

If you have sorted input, the buckets in the same neighborhood relatively will get a bunch of hits, and as your sorted input moves into a higher range, the next neighborhood of buckets will start getting hits.

Consider a simplified example to illustrate this:

Suppose you have 10 buckets of range size 1,000 each:

[0-999], [1000-1999], ..., [9000-9999]

And suppose that you can only cache the reference to one bucket at a time (this is the contrived part, but the idea is the same in practice:

Now suppose your input set are random numbers between [0 - 9999]

  • If input is sorted, each bucket will get exactly one initial cache miss, followed then by numerous cache hits until the sequence in the input moves into the next bucket's range.
  • If you have unsorted input and in the worst case, the cache will always miss, and load another bucket into the cache, and only have it miss again because the next number in the unsorted sequence will look for yet another bucket.
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top