Question

I want to generate combinations that associate indices in a list with "slots". For instance,(0, 0, 1) means that 0 and 1 belong to the same slot while 2 belongs to an other. (0, 1, 1, 1) means that 1, 2, 3 belong to the same slot while 0 is by itself. In this example, 0 and 1 are just ways of identifying these slots but do not carry information for my usage.

Consequently, (0, 0, 0) is absolutely identical to (1, 1, 1) for my purposes, and (0, 0, 1) is equivalent to (1, 1, 0).

The classical cartesian product generates a lot of these repetitions I'd like to get rid of.

This is what I obtain with itertools.product :

>>> LEN, SIZE = (3,1)
>>> list(itertools.product(range(SIZE+1), repeat=LEN))
>>>
[(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 0, 1),
(1, 1, 0),
(1, 1, 1)]

And this is what I'd like to get:

>>> [(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1)]

It is easy with small lists but I don't quite see how to do this with bigger sets. Do you have a suggestion?

If it's unclear, please tell me so that I can clarify my question. Thank you!

Edit: based on Sneftel's answer, this function seems to work, but I don't know if it actually yields all the results:

def test():
    for p in product(range(2), repeat=3):
        j=-1
        good = True
        for k in p:
            if k> j and (k-j) > 1:
                good = False
            elif k >j:
                j = k
        if good:
            yield p
Was it helpful?

Solution

I would start by making the following observations:

  1. The first element of each combination must be 0.
  2. The second element must be 0 or 1.
  3. The third element must be 0, 1 or 2, but it can only be 2 if the second element was 1.

These observations suggest the following algorithm:

def assignments(n, m, used=0):
    """Generate assignments of `n` items to `m` indistinguishable
    buckets, where `used` buckets have been used so far.

        >>> list(assignments(3, 1))
        [(0, 0, 0)]
        >>> list(assignments(3, 2))
        [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]
        >>> list(assignments(3, 3))
        [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2)]

    """
    if n == 0:
        yield ()
        return
    aa = list(assignments(n - 1, m, used))
    for first in range(used):
        for a in aa:
            yield (first,) + a
    if used < m:
        for a in assignments(n - 1, m, used + 1):
            yield (used,) + a

This handles your use case (12 items, 5 buckets) in a few seconds:

>>> from timeit import timeit
>>> timeit(lambda:list(assignments(12, 5)), number=1)
4.513746023178101
>>> sum(1 for _ in assignments(12, 5))
2079475

This is substantially faster than the function you give at the end of your answer (the one that calls product and then drops the invalid assignments) would be if it were modified to handle the (12, 5) use case:

>>> timeit(lambda:list(test(12, 5)), number=1)
540.693009853363

OTHER TIPS

Before checking for duplicates, you should harmonize the notation (assuming you don't want to set up some fancy AI): iterate through the lists and assign set-affiliation numbers for differing elements starting at 0, counting upwards. That is, you create a temporary dictionary per line that you are processing.

An exemplary output would be

(0,0,0) -> (0,0,0)
(0,1,0) -> (0,1,0)

but

(1,0,1) -> (0,1,0)

Removing the duplicates can then easily be performed as the problem is reduced to the problem of the solved question at Python : How to remove duplicate lists in a list of list?

If you only consider the elements of the cartesian product where the first occurrences of all indices are sorted and consecutive from zero, that should be sufficient. itertools.combinations_with_replacement() will eliminate those that are not sorted, so you'll only need to check that indices aren't being skipped.

In your specific case you could simply take the first or the second half of the list of those items produced by a cartesian product.

import itertools

alphabet = '01' 
words3Lettered = [''.join(letter) for letter in itertools.product(alphabet,repeat=3)] 

for n lettered words use repeat=n

words3Lettered looks like this:

['000', '001', '010', '011', '100', '101', '110', '111']

next,

usefulWords = words3Lettered[:len(words3Lettered)/2]

which looks like this:

['000', '001', '010', '011']

you might be interested in the other half i.e. words3Lettered[len(words3Lettered)/2:] though the other half was supposed to "fold" onto the first half.

most probably you want to use the combination of letters in numeric form so...

indexes = [tuple(int(j) for j in word) for word in usefulWords]

which gives us:

[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top