سؤال

I have a large array A of size [0, 8388608] of "relatively small" integers A[i] = [0, 131072] and I want to find the most frequently occurring element of every N=32 elements.

What would be faster,

A. Create an associative array B of size 131072, iterate through 32 elements, increment B[A[i]], then iterate through B, find the largest value, reset all elements in B to 0, repeat |A|/32 times.

B. qsort every 32 elements, find the largest range where A[i] == A[i-1] (and thus the most frequent element), repeat |A|/32 times.

(EDIT) C. Something else.

هل كانت مفيدة؟

المحلول

An improvement over the first approach is possible. There is no need to iterate through B. And it can be an array of size 131072

Every time you increment B[A[i]], look at the new value in that cell. Then, have a global highest_frequency_found_far. This start at zero, but after every increment the new value should be compared with this global. If it's higher, then the global is replaced.

You could also have a global value_that_was_associated_with_the_highest_count

for each block of 32 members of A ... {
    size_t B [131072] = {0,0,...};
    size_t highest_frequency_found_so_far = 0;
    int value_associated_with_that = 0;
    for(a : A) { // where A just means the current 32-element sub-block
        const int new_frequency = ++B[a];
        if (new_frequency > highest_frequency_found_so_far) {
            highest_frequency_found_so_far = new_frequency;
            value_associated_with_that = a;
        }
    }
    // now, 'value_associated_with_that' is the most frequent element

    // Thanks to @AkiSuihkonen for pointing out a really simple way to reset B each time.
    // B is big, instead of zeroing each element explicitly, just do this loop to undo
    // the ++B[a] from earlier:
    for(a : A) { --B[a]; }
}

نصائح أخرى

what about a btree? You only need a max of 32 nodes and can declare them up front.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top