Pregunta

I have 3 groups of items(names). Group 1 has around 2,1k items, group 2 - around 7,6k and group 3 around 21k. I need to search in these groups. I need a hint which is better. I thought either to put all in a bin tree:

 GTree* t = g_tree_new((GCompareFunc)g_ascii_strcasecmp);
 and search like this:
 goup =  g_tree_lookup(t, (gpointer *)itemName);

Or would it be more efficient to make 3 arrays of strings:

char g1[2300][14];
char g2[8000][14];
char g3[78000][14];

And search like this (not checked, pseudocode):

    int isvalueinarray(char val,  char *g[][14];, int size){
        int i;
        for (i=0; i < size; i++) {
            if (memcmp(val, g[i], strlenth) == 0)
                return true;
        }
        return false;
    }
    int i group=0;
    if (isvalueinarray(itemName, g2, 7800) ) group = 2;
    if (isvalueinarray(itemName, g1, 2300) ) group = 1;

Or is there a better solution for that?

¿Fue útil?

Solución 2

If you want the most asymptotically efficient way of finding whether a string is in a set of strings which can be pre-processed, you can consider using a trie. The construction time is linear (O(L), where L is the sum of all the strings lengths in three sets), and the lookup time is linear on the size of a string which you are looking for (which is, in your case, 14).

A binary search tree would also be a nice option, giving you a logarithmic (on the size of the string sets) performance. This can be a bit slower, but probably easier to implement. Note that pre-processing (inserting all the strings of three sets into a tree) takes N * log(N) time, where N is the sum of set sizes.

Do not use linear search in an array, it's too slow.

Otros consejos

The mathematically fastest would be to have everything in one tree and do a O(logN) binary search. Because Log(3N) is faster then 3 * Log(N). (N being the size of each array)

But in no case you would search using the pseudocode you wrote, you ALWAYS do binary search in a huge set of data. The one you wrote has complexity O(N) while a binary search has O(logN).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top