[only equal operator]what are the fast algorithms to find duplicate elements in a collection and group them?

StackOverflow https://stackoverflow.com/questions/17740402

Domanda

Suppose we have a collection of elements, and these elements only have equal operator. So, it's impossible to sort them.

how can you pick out those with duplicates and put them into each group with least amount of comparison? preferably in C++, but algorithm is more important than the language. For Example given {E1,E2,E3,E4,E4,E2,E6,E4,E3}, I wish to extract out {E2,E2}, {E3,E3}, {E4,E4,E4}. what data structure and algorithm you will choose?

EDIT

My scenario, if binary data 1 is equal to binary data 2 we can say these two elements are identical. But, only = and != is logical

element 1:

4 0 obj
<< /Type /Pages /Kids 5 0 R /Count 1 >>
stream
.....binary data 1....
endstream
endobj

element 2:

5 0 obj
<< /Type /Pages /Kids 5 0 R /Count 1 >>
stream
.....binary data 2....
endstream
endobj
È stato utile?

Soluzione

It is sufficient to find any arbitrary predicate P such that P(a,a)==false, P(a,b) && P(b,a)==false, P(a,b) && P(b,c) implies P(a,c) and !P(a,b) && !P(b,a) implies a == b. Less-then satisfies this property, as thus greater-then. But they're far from the only possibilities.

You can now sort your collection by predicate P, and all elements which are equal will be adjacent. In your case, define P(E1,E2)=true, P(E2,E3)=true, etc.

Altri suggerimenti

For your answer, though I am not 100% sure that you want this is only.

If you want good algo try Binary search tree creation. as it is a group,and according to BST properties you can easily group elements.

For Example

BST()
{
    count = 0;
    if(elementinserted)
        count = 1;
    if(newelement == already inserted element)
    {
        count++;
        put element in array upto count value;
    }
}

I hope this explanation can help you.

If all you have is an equality test, you have no hope.

Suppose you have a situation where each element is unique. And another where only two elements are duplicates.

There are n(n+1)/2 of the second type. Each can only be distinguished from the first by a particular comparison. Which means in the worst case you must do all n(n+1)/2 comparisons: exhastive search over all pairs.

What you need to do is to figure out what else you can really do, as equality only is exceedingly rare.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top