How to iteratively generate k elements subsets from a set of size n in java?

https://stackoverflow.com/questions/4504974

java
subset

12-10-2019
|

Question

I'm working on a puzzle that involves analyzing all size k subsets and figuring out which one is optimal. I wrote a solution that works when the number of subsets is small, but it runs out of memory for larger problems. Now I'm trying to translate an iterative function written in python to java so that I can analyze each subset as it's created and get only the value that represents how optimized it is and not the entire set so that I won't run out of memory. Here is what I have so far and it doesn't seem to finish even for very small problems:

public static LinkedList<LinkedList<Integer>> getSets(int k, LinkedList<Integer> set)
{
    int N = set.size();
    int maxsets = nCr(N, k);
    LinkedList<LinkedList<Integer>> toRet = new LinkedList<LinkedList<Integer>>();

    int remains, thresh;
    LinkedList<Integer> newset; 
    for (int i=0; i<maxsets; i++)
    {
        remains = k;
        newset = new LinkedList<Integer>();
        for (int val=1; val<=N; val++)
        {
            if (remains==0)
                break;
            thresh = nCr(N-val, remains-1);
            if (i < thresh)
            {
                newset.add(set.get(val-1));
                remains --;
            }
            else 
            {
                i -= thresh;
            }
        }
        toRet.add(newset);
    }

    return toRet;

}

Can anybody help me debug this function or suggest another algorithm for iteratively generating size k subsets?

EDIT: I finally got this function working, I had to create a new variable that was the same as i to do the i and thresh comparison because python handles for loop indexes differently.

Solution

First, if you intend to do random access on a list, you should pick a list implementation that supports that efficiently. From the javadoc on LinkedList:

All of the operations perform as could be expected for a doubly-linked list. Operations that index into the list will traverse the list from the beginning or the end, whichever is closer to the specified index.

An ArrayList is both more space efficient and much faster for random access. Actually, since you know the length beforehand, you can even use a plain array.

To algorithms: Let's start simple: How would you generate all subsets of size 1? Probably like this:

for (int i = 0; i < set.length; i++) {
    int[] subset = {i};
    process(subset);
}

Where process is a method that does something with the set, such as checking whether it is "better" than all subsets processed so far.

Now, how would you extend that to work for subsets of size 2? What is the relationship between subsets of size 2 and subsets of size 1? Well, any subset of size 2 can be turned into a subset of size 1 by removing its largest element. Put differently, each subset of size 2 can be generated by taking a subset of size 1 and adding a new element larger than all other elements in the set. In code:

processSubset(int[] set) {
    int subset = new int[2];
    for (int i = 0; i < set.length; i++) {
        subset[0] = set[i];
        processLargerSets(set, subset, i);
    }
}

void processLargerSets(int[] set, int[] subset, int i) {
    for (int j = i + 1; j < set.length; j++) {
        subset[1] = set[j];
        process(subset);
    }
}

For subsets of arbitrary size k, observe that any subset of size k can be turned into a subset of size k-1 by chopping of the largest element. That is, all subsets of size k can be generated by generating all subsets of size k - 1, and for each of these, and each value larger than the largest in the subset, add that value to the set. In code:

static void processSubsets(int[] set, int k) {
    int[] subset = new int[k];
    processLargerSubsets(set, subset, 0, 0);
}

static void processLargerSubsets(int[] set, int[] subset, int subsetSize, int nextIndex) {
    if (subsetSize == subset.length) {
        process(subset);
    } else {
        for (int j = nextIndex; j < set.length; j++) {
            subset[subsetSize] = set[j];
            processLargerSubsets(set, subset, subsetSize + 1, j + 1);
        }
    }
}

Test code:

static void process(int[] subset) {
    System.out.println(Arrays.toString(subset));
}


public static void main(String[] args) throws Exception {
    int[] set = {1,2,3,4,5};
    processSubsets(set, 3);
}

But before you invoke this on huge sets remember that the number of subsets can grow rather quickly.

OTHER TIPS

You can use org.apache.commons.math3.util.Combinations.

Example:

import java.util.Arrays;
import java.util.Iterator;

import org.apache.commons.math3.util.Combinations;

public class tmp {
    public static void main(String[] args) {
        for (Iterator<int[]> iter = new Combinations(5, 3).iterator(); iter.hasNext();) {
            System.out.println(Arrays.toString(iter.next()));
        }
    }

}

Output: [0, 1, 2] [0, 1, 3] [0, 2, 3] [1, 2, 3] [0, 1, 4] [0, 2, 4] [1, 2, 4] [0, 3, 4] [1, 3, 4] [2, 3, 4]

Here is a combination iterator I wrote recetnly

package psychicpoker;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;

import static com.google.common.base.Preconditions.checkArgument;

public class CombinationIterator<T> implements Iterator<Collection<T>> {

private int[] indices;
private List<T> elements;
private boolean hasNext = true;

public CombinationIterator(List<T> elements, int k) throws IllegalArgumentException {
    checkArgument(k<=elements.size(), "Impossible to select %d elements from hand of size %d", k, elements.size());
    this.indices = new int[k];
    for(int i=0; i<k; i++)
        indices[i] = k-1-i;
    this.elements = elements;
}

public boolean hasNext() {
    return hasNext;
}

private int inc(int[] indices, int maxIndex, int depth) throws IllegalStateException {
    if(depth == indices.length) {
        throw new IllegalStateException("The End");
    }
    if(indices[depth] < maxIndex) {
        indices[depth] = indices[depth]+1;
    } else {
        indices[depth] = inc(indices, maxIndex-1, depth+1)+1;
    }
    return indices[depth];
}

private boolean inc() {
    try {
        inc(indices, elements.size() - 1, 0);
        return true;
    } catch (IllegalStateException e) {
        return false;
    }
}

public Collection<T> next() {
    Collection<T> result = new ArrayList<T>(indices.length);
    for(int i=indices.length-1; i>=0; i--) {
        result.add(elements.get(indices[i]));
    }
    hasNext = inc();
    return result;
}

public void remove() {
    throw new UnsupportedOperationException();
}

}

I've had the same problem today, of generating all k-sized subsets of a n-sized set.

I had a recursive algorithm, written in Haskell, but the problem required that I wrote a new version in Java.
In Java, I thought I'd probably have to use memoization to optimize recursion. Turns out, I found a way to do it iteratively. I was inspired by this image, from Wikipedia, on the article about Combinations.

Method to calculate all k-sized subsets:

public static int[][] combinations(int k, int[] set) {
    // binomial(N, K)
    int c = (int) binomial(set.length, k);
    // where all sets are stored
    int[][] res = new int[c][Math.max(0, k)];
    // the k indexes (from set) where the red squares are
    // see image above
    int[] ind = k < 0 ? null : new int[k];
    // initialize red squares
    for (int i = 0; i < k; ++i) { ind[i] = i; }
    // for every combination
    for (int i = 0; i < c; ++i) {
        // get its elements (red square indexes)
        for (int j = 0; j < k; ++j) {
            res[i][j] = set[ind[j]];
        }
        // update red squares, starting by the last
        int x = ind.length - 1;
        boolean loop;
        do {
            loop = false;
            // move to next
            ind[x] = ind[x] + 1;
            // if crossing boundaries, move previous
            if (ind[x] > set.length - (k - x)) {
                --x;
                loop = x >= 0;
            } else {
                // update every following square
                for (int x1 = x + 1; x1 < ind.length; ++x1) {
                    ind[x1] = ind[x1 - 1] + 1;
                }
            }
        } while (loop);
    }
    return res;
}

Method for the binomial:
(Adapted from Python example, from Wikipedia)

private static long binomial(int n, int k) {
    if (k < 0 || k > n) return 0;
    if (k > n - k) {    // take advantage of symmetry
        k = n - k;
    }
    long c = 1;
    for (int i = 1; i < k+1; ++i) {
        c = c * (n - (k - i));
        c = c / i;
    }
    return c;
}

Of course, combinations will always have the problem of space, as they likely explode.
In the context of my own problem, the maximum possible is about 2,000,000 subsets. My machine calculated this in 1032 milliseconds.

Inspired by afsantos's answer :-)... I decided to write a C# .NET implementation to generate all subset combinations of a certain size from a full set. It doesn't need to calc the total number of possible subsets; it detects when it's reached the end. Here it is:

public static List<object[]> generateAllSubsetCombinations(object[] fullSet, ulong subsetSize) {
    if (fullSet == null) {
        throw new ArgumentException("Value cannot be null.", "fullSet");
    }
    else if (subsetSize < 1) {
        throw new ArgumentException("Subset size must be 1 or greater.", "subsetSize");
    }
    else if ((ulong)fullSet.LongLength < subsetSize) {
        throw new ArgumentException("Subset size cannot be greater than the total number of entries in the full set.", "subsetSize");
    }

    // All possible subsets will be stored here
    List<object[]> allSubsets = new List<object[]>();

    // Initialize current pick; will always be the leftmost consecutive x where x is subset size
    ulong[] currentPick = new ulong[subsetSize];
    for (ulong i = 0; i < subsetSize; i++) {
        currentPick[i] = i;
    }

    while (true) {
        // Add this subset's values to list of all subsets based on current pick
        object[] subset = new object[subsetSize];
        for (ulong i = 0; i < subsetSize; i++) {
            subset[i] = fullSet[currentPick[i]];
        }
        allSubsets.Add(subset);

        if (currentPick[0] + subsetSize >= (ulong)fullSet.LongLength) {
            // Last pick must have been the final 3; end of subset generation
            break;
        }

        // Update current pick for next subset
        ulong shiftAfter = (ulong)currentPick.LongLength - 1;
        bool loop;
        do {
            loop = false;

            // Move current picker right
            currentPick[shiftAfter]++;

            // If we've gotten to the end of the full set, move left one picker
            if (currentPick[shiftAfter] > (ulong)fullSet.LongLength - (subsetSize - shiftAfter)) {
                if (shiftAfter > 0) {
                    shiftAfter--;
                    loop = true;
                }
            }
            else {
                // Update pickers to be consecutive
                for (ulong i = shiftAfter+1; i < (ulong)currentPick.LongLength; i++) {
                    currentPick[i] = currentPick[i-1] + 1;
                }
            }
        } while (loop);
    }

    return allSubsets;
}

This solution worked for me:

 private static void findSubsets(int array[])
    {
      int numOfSubsets = 1 << array.length; 

      for(int i = 0; i < numOfSubsets; i++)
     {
        int pos = array.length - 1;
       int bitmask = i;

       System.out.print("{");
       while(bitmask > 0)
       {
        if((bitmask & 1) == 1)
         System.out.print(array[pos]+",");
        bitmask >>= 1;
        pos--;
       }
       System.out.print("}");
     }
    }

Swift implementation:

Below are two variants on the answer provided by afsantos.

The first implementation of the combinations function mirrors the functionality of the original Java implementation.

The second implementation is a general case for finding all combinations of k values from the set [0, setSize). If this is really all you need, this implementation will be a bit more efficient.

In addition, they include a few minor optimizations and a smidgin logic simplification.

/// Calculate the binomial for a set with a subset size
func binomial(setSize: Int, subsetSize: Int) -> Int
{
    if (subsetSize <= 0 || subsetSize > setSize) { return 0 }

    // Take advantage of symmetry
    var subsetSizeDelta = subsetSize
    if (subsetSizeDelta > setSize - subsetSizeDelta)
    {
        subsetSizeDelta = setSize - subsetSizeDelta
    }

    // Early-out
    if subsetSizeDelta == 0 { return 1 }

    var c = 1
    for i in 1...subsetSizeDelta
    {
        c = c * (setSize - (subsetSizeDelta - i))
        c = c / i
    }
    return c
}

/// Calculates all possible combinations of subsets of `subsetSize` values within `set`
func combinations(subsetSize: Int, set: [Int]) -> [[Int]]?
{
    // Validate inputs
    if subsetSize <= 0 || subsetSize > set.count { return nil }

    // Use a binomial to calculate total possible combinations
    let comboCount = binomial(setSize: set.count, subsetSize: subsetSize)
    if comboCount == 0 { return nil }

    // Our set of combinations
    var combos = [[Int]]()
    combos.reserveCapacity(comboCount)

    // Initialize the combination to the first group of set indices
    var subsetIndices = [Int](0..<subsetSize)

    // For every combination
    for _ in 0..<comboCount
    {
        // Add the new combination
        var comboArr = [Int]()
        comboArr.reserveCapacity(subsetSize)
        for j in subsetIndices { comboArr.append(set[j]) }
        combos.append(comboArr)

        // Update combination, starting with the last
        var x = subsetSize - 1
        while true
        {
            // Move to next
            subsetIndices[x] = subsetIndices[x] + 1

            // If crossing boundaries, move previous
            if (subsetIndices[x] > set.count - (subsetSize - x))
            {
                x -= 1
                if x >= 0 { continue }
            }
            else
            {
                for x1 in x+1..<subsetSize
                {
                    subsetIndices[x1] = subsetIndices[x1 - 1] + 1
                }
            }
            break
        }
    }
    return combos
}


/// Calculates all possible combinations of subsets of `subsetSize` values within a set
/// of zero-based values for the set [0, `setSize`)
func combinations(subsetSize: Int, setSize: Int) -> [[Int]]?
{
    // Validate inputs
    if subsetSize <= 0 || subsetSize > setSize { return nil }

    // Use a binomial to calculate total possible combinations
    let comboCount = binomial(setSize: setSize, subsetSize: subsetSize)
    if comboCount == 0 { return nil }

    // Our set of combinations
    var combos = [[Int]]()
    combos.reserveCapacity(comboCount)

    // Initialize the combination to the first group of elements
    var subsetValues = [Int](0..<subsetSize)

    // For every combination
    for _ in 0..<comboCount
    {
        // Add the new combination
        combos.append([Int](subsetValues))

        // Update combination, starting with the last
        var x = subsetSize - 1
        while true
        {
            // Move to next
            subsetValues[x] = subsetValues[x] + 1

            // If crossing boundaries, move previous
            if (subsetValues[x] > setSize - (subsetSize - x))
            {
                x -= 1
                if x >= 0 { continue }
            }
            else
            {
                for x1 in x+1..<subsetSize
                {
                    subsetValues[x1] = subsetValues[x1 - 1] + 1
                }
            }
            break
        }
    }
    return combos
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow