Question

I have a huge int array which I need to find the Mode of,

Ive seen a few methods that use 2 for loops (one nested) which seems unnecessary.

The only way I can think to find the mode with only one loop involves using Maps:

int[] elements = new int[]{....numbers...};
Map<Integer,Integer> map = new .....Map Type....;
for(int number : elements){
    if(map.containsKey(Integer.valueOf(number))){
        map.put(Integer.valueOf(number),map.get(Integer.valueOf(number))+1);
    }else{
        map.put(Integer.valueOf(number),1);
    }
}

Im not sure what kind of speed benefits using maps would actually give. Is there a better method?

Était-ce utile?

La solution

If you use a hash map, the runtime complexity of your algorithm should be O(n): You visit each of the n elements once, and HashMap lookup and write is usually assumed to be O(1). So in total you get O(n * 1) which is O(n). If you use a tree map, you get O(n log n).

Compared to two nested loops (which sounds like O(n²)), the speed improvement is going from quadratic to linear, which is quite good: For 1000 elements, you perform 1000 "steps" instead of 1,000,000.

P.S. Getting better than linear is probably hard here -- can't imagine a way of calculating this without visiting each element at least once.

Autres conseils

As Stefan Haustein already wrote, the complexity using a map is much lower than using 2 for loops.

There is one further improvement or rather specialization that can be done if you know the range of numbers stored inside your array. For example if you count colors which are in the range of 0-255, you don't have to use a map and instead can use a simple array.

int[] elements = new int[]{....numbers...};
int[] histogram = new int[256]; // 255 = highest possible value in elements
for(int number : elements){
  ++histogram[number];    
}

Using a map is a more generalized way. You can think of a map as an array with a more complex indexing function. So in a normal array the number is at array pointer + index while in a map this is calculated using a liner hash function.

No algorithm can be faster than O(n) (have a look at the wikipedia page for big-o notation). At least, not consistently (across all possible inputs). This does not mean that it cannot go any faster -- just that, beyond a certain problem size, whatever is faster can't keep on increasing the speed difference by more than a (probably small) linear factor.

This is because, whatever the order in which you examine the elements, given an array that is "almost balanced" as to the winner, the last element you examine can turn out to be the tiebreaker. Give me any algorithm that doesn't look at all elements, and I can write a input array that will make it return incorrect results. Therefore, you have to examine all of them at least once: O(n) complexity.

Hashmaps have general insert and lookup complexities of O(1) -- that is, on average, regardless of the size of the data, they take up a constant time to do their thing. Note that this constant time is several times larger than, say, array update/lookups (see TwoThe's answer). Therefore, except for constants (which will vary depending on hashmap implementation, machine, VM, and so on), you can't get much faster than the code you posted. If you really need that 10% extra performance, then build a benchmark on hardware/software/input data as near as possible to your intended deployment and optimize that.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top