Question

I have an array of arbitrary hashes, with an element of the hash an integer (call it 'id'). I want to sort these hashes into a number of buckets (constant over the array), where each bucket is an arbitrary range of 'ids' (e.g. 1-10, 15-20, 20-30). What is the best sorting strategy to do this? Is it possible to do without a nested loop?

Was it helpful?

Solution

If the number of buckets is small, you are probably better off with the nested loops. The outer loop over the hashes, and the inner over the buckets. O(n*m).

If the number of hashes, and the number of buckets are large, you can:

hashes = sort(hashes)
buckets = sort(buckets) # sort by lower-bound of bucket
i = 0

foreach (hash in hashes) {
  while (buckets[i].lower_bound > hash) {
    i = i + 1
  }
  bucket[i].add(hash)
}

The basically loops through the hashes adding them to the current bucket and advancing to the next bucket when needed. O(n*log(n) + m*log(m))

OTHER TIPS

If the hashes are good quality, they will exhibit an even distribution, so you can use evenly-distributed buckets to partition the collection in a single pass.

If you also want the hashes sorted within the buckets, use a normal sorting algorithm after everything is in buckets. This would be an unusual use of hashes, however. (If you aren't trying to sort within buckets, then the word "sort" is a misnomer. What you really wanted was partitioning.)

You don't mention a language/platform, but for efficient in terms of keystrokes (C#):

        var histogram = new[] { 0, 10, 15, 20, 30, 40 };
        var values = new[] { 12, 14, 5, 6, 7, 1, 34, 26, 17 };
        var bars = values.GroupBy(v => histogram.First(b => v < b));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top