Question

How can I uniquify the following list in Python:

all_the_ways = [(5,), (2, 2, 1), (2, 1, 2), (2, 1, 1, 1), (1, 2, 2),\
                (1, 2, 1, 1), (1, 1, 2, 1), (1, 1, 1, 2), (1, 1, 1, 1, 1)]

Desired output is:

[(5,), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1)]

i.e. I need to get rid of tuples that have the same set of numbers but in different order.

I tried

set(all_the_ways)

but it only transpose elements.

And when I do

list(map(set, all_the_ways))

the things getting only worse:

[{5}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1}]

In other words I need to convert inner tuple to a collection that allows multiple equal items (set is not suitable) and for which permutations of elements don't change the collection itself (kinda like C++'s multiset)

Was it helpful?

Solution

How about this:

list(set(tuple(sorted(s)) for s in all_the_ways))

Output:

[(1, 2, 2), (5,), (1, 1, 1, 1, 1), (1, 1, 1, 2)]

It will mangle the order of each tuple though. I'm assuming that doesn't matter, as tuples containing the same set of numbers are considered the same for your case. What this implies is that in the end, the output list might contain tuples that are not among the original input, for example (credit to @thefourtheye):

all_the_ways = [(2, 1, 2), (2, 2, 1)]
# Output: [(1, 2, 2)]

This may or may not be a problem, and if it is, you can use the more robust solutions which are already mentioned in the other excellent answers.

OTHER TIPS

Use collections.Counter() to identify the unique multisets:

>>> from collections import Counter

>>> all_the_ways = [(5,), (2, 2, 1), (2, 1, 2), (2, 1, 1, 1), (1, 2, 2),\
                (1, 2, 1, 1), (1, 1, 2, 1), (1, 1, 1, 2), (1, 1, 1, 1, 1)]
>>> result = []
>>> seen = set()
>>> for tup in all_the_ways:
        key = tuple(sorted(Counter(tup).items())) # unique signature
        if key not in seen:
            result.append(tup)
        seen.add(key)

>>> result
[(5,), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1)]

If the Order doesn't matter you can use this

from collections import Counter
>>> {frozenset(Counter(tup).items()):tup for tup in data}.values()
# [(1, 2, 2), (1, 1, 1, 2), (1, 1, 1, 1, 1), (5,)]

If you want to maintain the Order,

from collections import Counter, OrderedDict
OrderedDict([frozenset(Counter(tup).items()),tup] for tup in data).values()
# [(5,), (1, 2, 2), (1, 1, 1, 2), (1, 1, 1, 1, 1)]

In both the solutions we rely on frozenset, because the set objects are not hashable as they are mutable. In the first case, we construct a dictionary with the frequency of the numbers (determined with Counter) as the key and the current tuple as the value corresponding to that. Once the dictionary construction is completed, we take all the values, which correspond to the tuples.

In the second case, we simply use OrderedDict to maintain the order.

May be this?:

result = {tuple(sorted(x)) for x in all_the_ways}

Try

from collections import OrderedDict
print OrderedDict.fromkeys(map(lambda x: tuple(sorted(x)), all_the_ways)).keys()

or

print set(map(lambda x: tuple(sorted(x)), all_the_ways))

I take it that you consider two elements "equal" if they contain the same values, regardless of the order.

So you can "canonicalize" each tuple by sorting it, convert back to tuples (so they are hashable), and remove duplicates using set again:

set( tuple(sorted(tup)) for tup in all_the_ways )

You can also preserve the original "outer" order, by using OrderedSet instead of set.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top