Question

I have a list of tuples like this: [(1, 2, 3), (2, 4)] (length of the list, and of the tuples may vary) and I want to get all combinations containing at least one element from every tuple in the list, but also those combinations which contain more.

So the result should be in this example:
[[1, 2, 3, 2, 4], [1, 2, 2, 4], [2, 3, 2, 4], [1, 2, 4], [2, 2, 4], [3, 2, 4], [1, 2, 3, 2], [1, 2, 3, 4], [1, 2, 2], [1, 2, 4], [2, 3, 2], [2, 3, 4], [1, 2], [1, 4], [2, 2], [2, 4], [3, 2], [3, 4]]

The smallest result should contain number of elements equal to number of tuples in the original list, the biggest one should contain all elements present in tuples.

The order of the elements does not matter and duplicates should be eventually eliminated (so [1, 2, 3, 2, 4] = [1, 2, 3, 4] and should be in the result only once, similarly [3, 2] = [2, 3] etc.), but I thought about sorting and/or eliminating duplicates after creating the whole list.

What is the best way to do it? Frankly, I do not even have a clue how to properly start…

Was it helpful?

Solution

You want the Cartesian product of the powersets of the items in L - except when any of them are empty. One way is to just leave the empty elements out when we construct the powerset.

from itertools import product, combinations, chain
L = [(1, 2, 3), (2, 4)]
def powerset(iterable):
    "powerset minus the empty element"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(1, len(s)+1))

print [list(chain.from_iterable(c)) for c in product(*(powerset(x) for x in L))]

prints

[[1, 2], [1, 4], [1, 2, 4], [2, 2], [2, 4], [2, 2, 4], [3, 2], [3, 4], [3, 2, 4], [1, 2, 2], [1, 2, 4], [1, 2, 2, 4], [1, 3, 2], [1, 3, 4], [1, 3, 2, 4], [2, 3, 2], [2, 3, 4], [2, 3, 2, 4], [1, 2, 3, 2], [1, 2, 3, 4], [1, 2, 3, 2, 4]]

OTHER TIPS

Let's the two lists be denoted by X and Y, and their lengths be denoted by |X| and |Y|.

The powerset(X) has length 2^|X|, and the powerset(Y) has length 2^|Y|.

So the product of the two powersets has length 2^(|X|+|Y|). For each item in this product, we would need to combine the parts, take the set (to remove duplicates from the parts) to form a new collection. Then we'd need to take the set of this collection to remove duplicates from the collection. Having to take the set of the full collection could be memory-intensive, since it requires holding the full collection in memory at once.

However, I think there is a faster way to get to the desired end result. If you combine X and Y into a set, XY, then the powerset of XY has length 2^(|XY|). This length is less than 2^(|X|+|Y|) if X and Y share any items in common. Thus you save a factor of 2 for each item in common.

For each item in this powerset, we simply need to check that there is an element from X and an element from Y. Collect all such items and we are done. That's a lot less work, and it is less memory-intensive, since the result can be generated by an iterator.


import itertools as IT

def powerset(iterable, reverse=False, rvals=None):
    """powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"""
    s = list(iterable)
    N = len(s)
    if rvals is None:
        rvals = range(N, -1, -1) if reverse else range(N + 1)
    return IT.chain.from_iterable(
        IT.combinations(s, r) for r in rvals)

def powerreps(X, Y):
    """
    Return powerset with at least one representative from X and Y
    """
    XY = set(X).union(Y)
    for rep in powerset(XY, rvals=range(2, len(XY))):
        if any(x in rep for x in X) and any(y in rep for y in Y):
            yield rep

X, Y = (1, 2, 3), (2, 4)
print(list(powerreps(X, Y)))

yields

[(1, 2), (1, 4), (2, 3), (2, 4), (3, 4), (1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4), (1, 2, 3, 4)]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top