How to generate a list of subsets with restrictions?

https://stackoverflow.com/questions/1521589

19-09-2019
|

Question

I am trying to figure out an efficient algorithm to take a list of items and generate all unique subsets that result from splitting the list into exactly 2 sublists. I'm sure there is a general purpose way to do this, but I'm interested in a specific case. My list will be sorted, and there can be duplicate items.

Some examples:

Input
{1,2,3}

Output
{{1},{2,3}}
{{2},{1,3}}
{{3},{1,2}}

Input
{1,2,3,4}

Output
{{1},{2,3,4}}
{{2},{1,3,4}}
{{3},{1,2,4}}
{{4},{1,2,3}}
{{1,2},{3,4}}
{{1,3},{2,4}}
{{1,4},{2,3}}

Input
{1,2,2,3}

Output
{{1},{2,2,3}}
{{2},{1,2,3}}
{{3},{1,2,2}}
{{1,2},{2,3}}
{{1,3},{2,2}}

I can do this on paper, but I'm struggling to figure out a simple way to do it programmatically. I'm only looking for a quick pseudocode description of how to do this, not any specific code examples.

Any help is appreciated. Thanks.

Solution

The following C++ function does exactly what you need, but the order differs from the one in examples:

// input contains all input number with duplicates allowed
void generate(std::vector<int> input) {
  typedef std::map<int,int> Map;
  std::map<int,int> mp;
  for (size_t i = 0; i < input.size(); ++i) {
    mp[input[i]]++;
  }

  std::vector<int> numbers;
  std::vector<int> mult;
  for (Map::iterator it = mp.begin(); it != mp.end(); ++it) {
    numbers.push_back(it->first);
    mult.push_back(it->second);
  }

  std::vector<int> cur(mult.size());
  for (;;) {
    size_t i = 0;
    while (i < cur.size() && cur[i] == mult[i]) cur[i++] = 0;
    if (i == cur.size()) break;
    cur[i]++;
    std::vector<int> list1, list2;
    for (size_t i = 0; i < cur.size(); ++i) {
      list1.insert(list1.end(), cur[i], numbers[i]);
      list2.insert(list2.end(), mult[i] - cur[i], numbers[i]);
    }
    if (list1.size() == 0 || list2.size() == 0) continue;
    if (list1 > list2) continue;
    std::cout << "{{";
    for (size_t i = 0; i < list1.size(); ++i) {
      if (i > 0) std::cout << ",";
      std::cout << list1[i];
    }
    std::cout << "},{";
    for (size_t i = 0; i < list2.size(); ++i) {
      if (i > 0) std::cout << ",";
      std::cout << list2[i];
    }
    std::cout << "}\n";
  }
}

OTHER TIPS

If you were generating all subsets you would end up generating 2ⁿ subsets for a list of length n. A common way to do this is to iterate through all the numbers i from 0 to 2ⁿ-1 and use the bits that are set in i to determine which items are in the ith subset. This works because any item either is or is not present in any particular subset, so by iterating through all the combinations of n bits you iterate through the 2ⁿ subsets.

For example, to generate the subsets of (1, 2, 3) you would iterate through the numbers 0 to 7:

0 = 000_b → ()
1 = 001_b → (1)
2 = 010_b → (2)
3 = 011_b → (1, 2)
4 = 100_b → (3)
5 = 101_b → (1, 3)
6 = 110_b → (2, 3)
7 = 111_b → (1, 2, 3)

In your problem you can generate each subset and its complement to get your pair of mutually exclusive subsets. Each pair would be repeated when you do this so you only need to iterate up to 2^n-1 - 1 and then stop.

1 = 001_b → (1) + (2, 3)
2 = 010_b → (2) + (1, 3)
3 = 011_b → (1, 2) + (3)

To deal with duplicate items you could generate subsets of list indices instead of subsets of list items. Like with the list (1, 2, 2, 3) generate subsets of the list (0, 1, 2, 3) instead and then use those numbers as indices into the (1, 2, 2, 3) list. Add a level of indirection, basically.

Here's some Python code putting this all together.

#!/usr/bin/env python

def split_subsets(items):
    subsets = set()

    for n in xrange(1, 2 ** len(items) / 2):
        # Use ith index if ith bit of n is set.
        l_indices = [i for i in xrange(0, len(items)) if n & (1 << i) != 0]
        # Use the indices NOT present in l_indices.
        r_indices = [i for i in xrange(0, len(items)) if i not in l_indices]

        # Get the items corresponding to the indices above.
        l = tuple(items[i] for i in l_indices)
        r = tuple(items[i] for i in r_indices)

        # Swap l and r if they are reversed.
        if (len(l), l) > (len(r), r):
            l, r = r, l

        subsets.add((l, r))

    # Sort the subset pairs so the left items are in ascending order.
    return sorted(subsets, key = lambda (l, r): (len(l), l))

for l, r in split_subsets([1, 2, 2, 3]):
    print l, r

Output:

(1,) (2, 2, 3)
(2,) (1, 2, 3)
(3,) (1, 2, 2)
(1, 2) (2, 3)
(1, 3) (2, 2)

A bit of Erlang code, the problem is that it generates duplicates when you have duplicate elements, so the result list still needs to be filtered...

do([E,F]) -> [{[E], [F]}];
do([H|T]) -> lists:flatten([{[H], T}] ++
                           [[{[H|L1],L2},{L1, [H|L2]}]  || {L1,L2} <- all(T)]).

filtered(L) ->
  lists:usort([case length(L1) < length(L2) of true -> {L1,L2};
                                               false -> {L2,L1} end
              || {L1,L2} <- do(L)]).

in pseudocode this means that:

for a two long list {E,F} the result is {{E},{F}}
for longer lists take the first element H and the rest of the list T and return
- {{H},{T}} (the first element as a single element list, and the remaining list)
- also run the algorithm recursively for T, and for each {L1,L2} element in the resulting list return {{H,L1},{L2}} and {{L1},{H,L2}}

My suggestion is...

First, count how many of each value you have, possibly in a hashtable. Then calculate the total number of combinations to consider - the product of the counts.

Iterate through that number of combinations.

At each combination, copy your loop count (as x), then start an inner loop through your hashtable items.

For each hashtable item, use (x modulo count) as your number of instances of the hashtable key in the first list. Divide x by the count before repeating the inner loop.

If you are worried that the number of combinations might overflow your integer type, the issue is avoidable. Use an array with each item (one for every hashmap key) starting from zero, and 'count' through the combinations treating each array item as a digit (so the whole array represents the combination number), but with each 'digit' having a different base (the corresponding count). That is, to 'increment' the array, first increment item 0. If it overflows (becomes equal to its count), set it to zero and increment the next array item. Repeat the overflow checks until If overflows continue past the end of the array, you have finished.

I think sergdev is using a very similar approach to this second one, but using std::map rather than a hashtable (std::unordered_map should work). A hashtable should be faster for large numbers of items, but won't give you the values in any particular order. The ordering for each loop through the keys in a hashtable should be consistent, though, unless you add/remove keys.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow