Generate all binary strings of length n with k bits set

https://stackoverflow.com/questions/1851134

13-09-2019
|

Question

What's the best algorithm to find all binary strings of length n that contain k bits set? For example, if n=4 and k=3, there are...

I need a good way to generate these given any n and any k so I'd prefer it to be done with strings.

Solution

This method will generate all integers with exactly N '1' bits.

From https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation

Compute the lexicographically next bit permutation

Suppose we have a pattern of N bits set to 1 in an integer and we want the next permutation of N 1 bits in a lexicographical sense. For example, if N is 3 and the bit pattern is 00010011, the next patterns would be 00010101, 00010110, 00011001, 00011010, 00011100, 00100011, and so forth. The following is a fast way to compute the next permutation.
unsigned int v; // current permutation of bits
unsigned int w; // next permutation of bits

unsigned int t = v | (v - 1); // t gets v's least significant 0 bits set to 1
// Next set to 1 the most significant bit to change,
// set to 0 the least significant ones, and add the necessary 1 bits.
w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
The __builtin_ctz(v) GNU C compiler intrinsic for x86 CPUs returns the number of trailing zeros. If you are using Microsoft compilers for x86, the intrinsic is _BitScanForward. These both emit a bsf instruction, but equivalents may be available for other architectures. If not, then consider using one of the methods for counting the consecutive zero bits mentioned earlier. Here is another version that tends to be slower because of its division operator, but it does not require counting the trailing zeros.
unsigned int t = (v | (v - 1)) + 1;
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
Thanks to Dario Sneidermanis of Argentina, who provided this on November 28, 2009.

OTHER TIPS

Python

import itertools

def kbits(n, k):
    result = []
    for bits in itertools.combinations(range(n), k):
        s = ['0'] * n
        for bit in bits:
            s[bit] = '1'
        result.append(''.join(s))
    return result

print kbits(4, 3)

Output: ['1110', '1101', '1011', '0111']

Explanation:

Essentially we need to choose the positions of the 1-bits. There are n choose k ways of choosing k bits among n total bits. itertools is a nice module that does this for us. itertools.combinations(range(n), k) will choose k bits from [0, 1, 2 ... n-1] and then it's just a matter of building the string given those bit indexes.

Since you aren't using Python, look at the pseudo-code for itertools.combinations here:

http://docs.python.org/library/itertools.html#itertools.combinations

Should be easy to implement in any language.

Forget about implementation ("be it done with strings" is obviously an implementation issue!) -- think about the algorithm, for Pete's sake... just as in, your very first TAG, man!

What you're looking for is all combinations of K items out of a set of N (the indices, 0 to N-1 , of the set bits). That's obviously simplest to express recursively, e.g., pseudocode:

combinations(K, setN):
  if k > length(setN): return "no combinations possible"
  if k == 0: return "empty combination"
  # combinations INcluding the first item:
  return (first-item-of setN) combined combinations(K-1, all-but-first-of setN)
   union combinations(K-1, all-but-first-of setN)

i.e., the first item is either present or absent: if present, you have K-1 left to go (from the tail aka all-but-firs), if absent, still K left to go.

Pattern-matching functional languages like SML or Haskell may be best to express this pseudocode (procedural ones, like my big love Python, may actually mask the problem too deeply by including too-rich functionality, such as itertools.combinations, which does all the hard work for you and therefore HIDES it from you!).

What are you most familiar with, for this purpose -- Scheme, SML, Haskell, ...? I'll be happy to translate the above pseudocode for you. I can do it in languages such as Python too, of course -- but since the point is getting you to understand the mechanics for this homework assignment, I won't use too-rich functionality such as itertools.combinations, but rather recursion (and recursion-elimination, if needed) on more obvious primitives (such as head, tail, and concatenation). But please DO let us know what pseudocode-like language you're most familiar with! (You DO understand that the problem you state is identically equipotent to "get all combinations of K items out or range(N)", right?).

This C# method returns an enumerator that creates all combinations. As it creates the combinations as you enumerate them it only uses stack space, so it's not limited by memory space in the number of combinations that it can create.

This is the first version that I came up with. It's limited by the stack space to a length of about 2700:

static IEnumerable<string> BinStrings(int length, int bits) {
  if (length == 1) {
    yield return bits.ToString();
  } else {
    if (length > bits) {
      foreach (string s in BinStrings(length - 1, bits)) {
        yield return "0" + s;
      }
    }
    if (bits > 0) {
      foreach (string s in BinStrings(length - 1, bits - 1)) {
        yield return "1" + s;
      }
    }
  }
}

This is the second version, that uses a binary split rather than splitting off the first character, so it uses the stack much more efficiently. It's only limited by the memory space for the string that it creates in each iteration, and I have tested it up to a length of 10000000:

static IEnumerable<string> BinStrings(int length, int bits) {
  if (length == 1) {
    yield return bits.ToString();
  } else {
    int first = length / 2;
    int last = length - first;
    int low = Math.Max(0, bits - last);
    int high = Math.Min(bits, first);
    for (int i = low; i <= high; i++) {
      foreach (string f in BinStrings(first, i)) {
        foreach (string l in BinStrings(last, bits - i)) {
          yield return f + l;
        }
      }
    }
  }
}

One problem with many of the standard solutions to this problem is that the entire set of strings is generated and then those are iterated through, which may exhaust the stack. It quickly becomes unwieldy for any but the smallest sets. In addition, in many instances, only a partial sampling is needed, but the standard (recursive) solutions generally chop the problem into pieces that are heavily biased to one direction (eg. consider all the solutions with a zero starting bit, and then all the solutions with a one starting bit).

In many cases, it would be more desireable to be able to pass a bit string (specifying element selection) to a function and have it return the next bit string in such a way as to have a minimal change (this is known as a Gray Code) and to have a representation of all the elements.

Donald Knuth covers a whole host of algorithms for this in his Fascicle 3A, section 7.2.1.3: Generating all Combinations.

There is an approach for tackling the iterative Gray Code algorithm for all ways of choosing k elements from n at http://answers.yahoo.com/question/index?qid=20081208224633AA0gdMl with a link to final PHP code listed in the comment (click to expand it) at the bottom of the page.

One algorithm that should work:

generate-strings(prefix, len, numBits) -> String:
    if (len == 0):
        print prefix
        return
    if (len == numBits):
        print prefix + (len x "1")
    generate-strings(prefix + "0", len-1, numBits)
    generate-strings(prefix + "1", len-1, numBits)

Good luck!

In a more generic way, the below function will give you all possible index combinations for an N choose K problem which you can then apply to a string or whatever else:

def generate_index_combinations(n, k):

    possible_combinations = []

    def walk(current_index, indexes_so_far=None):
        indexes_so_far = indexes_so_far or []
        if len(indexes_so_far) == k:
            indexes_so_far = tuple(indexes_so_far)
            possible_combinations.append(indexes_so_far)
            return
        if current_index == n:
            return
        walk(current_index + 1, indexes_so_far + [current_index])
        walk(current_index + 1, indexes_so_far)

    if k == 0:
        return []
    walk(0)
    return possible_combinations

One possible 1.5-liner:

$ python -c 'import itertools; \
             print set([ n for n in itertools.permutations("0111", 4)])'

set([('1', '1', '1', '0'), ('0', '1', '1', '1'), ..., ('1', '0', '1', '1')])

.. where k is the number of 1s in "0111".

The itertools module explains equivalents for its methods; see the equivalent for the permutation method.

I would try recursion.

There are n digits with k of them 1s. Another way to view this is sequence of k+1 slots with n-k 0s distributed among them. That is, (a run of 0s followed by a 1) k times, then followed by another run of 0s. Any of these runs can be of length zero, but the total length needs to be n-k.

Represent this as an array of k+1 integers. Convert to a string at the bottom of the recursion.

Recursively call to depth n-k, a method that increments one element of the array before a recursive call and then decrements it, k+1 times.

At the depth of n-k, output the string.

int[] run = new int[k+1];

void recur(int depth) {
    if(depth == 0){
        output();
        return;
    }

    for(int i = 0; i < k + 1; ++i){
        ++run[i];
        recur(depth - 1);
        --run[i];
    }

public static void main(string[] arrrgghhs) {
    recur(n - k);
}

It's been a while since I have done Java, so there are probably some errors in this code, but the idea should work.

Are strings faster than an array of ints? All the solutions prepending to strings probably result in a copy of the string at each iteration.

So probably the most efficient way would be an array of int or char that you append to. Java has efficient growable containers, right? Use that, if it's faster than string. Or if BigInteger is efficient, it's certainly compact, since each bit only takes a bit, not a whole byte or int. But then to iterate over the bits you need to & mask a bit, and bitshift the mask to the next bit position. So probably slower, unless JIT compilers are good at that these days.

I would post this a a comment on the original question, but my karma isn't high enough. Sorry.

Python (functional style)

Using python's itertools.combinations you can generate all choices of k our of n and map those choices to a binary array with reduce

from itertools import combinations
from functools import reduce # not necessary in python 2.x

def k_bits_on(k,n):
       one_at = lambda v,i:v[:i]+[1]+v[i+1:]
       return [tuple(reduce(one_at,c,[0]*n)) for c in combinations(range(n),k)]

Example usage:

In [4]: k_bits_on(2,5)
Out[4]:
[(0, 0, 0, 1, 1),
 (0, 0, 1, 0, 1),
 (0, 0, 1, 1, 0),
 (0, 1, 0, 0, 1),
 (0, 1, 0, 1, 0),
 (0, 1, 1, 0, 0),
 (1, 0, 0, 0, 1),
 (1, 0, 0, 1, 0),
 (1, 0, 1, 0, 0),
 (1, 1, 0, 0, 0)]

Well for this question (where you need to iterate over all the submasks in increasing order of their number of set bits), which has been marked as a duplicate of this.

We can simply iterate over all the submasks add them to a vector and sort it according to the number of set bits.

vector<int> v;
for(ll i=mask;i>0;i=(i-1)&mask)
    v.push_back(i);
auto cmp = [](const auto &a, const auto &b){
    return __builtin_popcountll(a) < __builtin_popcountll(b);
}
v.sort(v.begin(), v.end(), cmp);

Another way would be to iterate over all the submasks N times and add a number to the vector if the number of set bits is equal to i in the ith iteration.

Both ways have complexity of O(n*2^n)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow