Calculating the Amount of Combinations

https://stackoverflow.com/questions/1838368

11-09-2019
|

Question

Cheers,

I know you can get the amount of combinations with the following formula (without repetition and order is not important):

// Choose r from n

n! / r!(n - r)!

However, I don't know how to implement this in C++, since for instance with

n = 52

n! = 8,0658175170943878571660636856404e+67

the number gets way too big even for unsigned __int64 (or unsigned long long). Is there some workaround to implement the formula without any third-party "bigint" -libraries?

Solution

Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long

unsigned long long
choose(unsigned long long n, unsigned long long k) {
    if (k > n) {
        return 0;
    }
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d) {
        r *= n--;
        r /= d;
    }
    return r;
}

This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.

UPDATE: There's a small possibility that the algorithm will overflow on the line:

r *= n--;

for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.

OTHER TIPS

From Andreas' answer:

Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long
unsigned long long
choose(unsigned long long n, unsigned long long k) {
    if (k > n) {
        return 0;
    }
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d) {
        r *= n--;
        r /= d;
    }
    return r;
}
This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.

UPDATE: There's a small possibility that the algorithm will overflow on the line:
r *= n--;
for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.

Consider n == 67 and k == 33. The above algorithm overflows with a 64 bit unsigned long long. And yet the correct answer is representable in 64 bits: 14,226,520,737,620,288,370. And the above algorithm is silent about its overflow, choose(67, 33) returns:

8,829,174,638,479,413

A believable but incorrect answer.

However the above algorithm can be slightly modified to never overflow as long as the final answer is representable.

The trick is in recognizing that at each iteration, the division r/d is exact. Temporarily rewriting:

r = r * n / d;
--n;

For this to be exact, it means if you expanded r, n and d into their prime factorizations, then one could easily cancel out d, and be left with a modified value for n, call it t, and then the computation of r is simply:

// compute t from r, n and d
r = r * t;
--n;

A fast and easy way to do this is to find the greatest common divisor of r and d, call it g:

unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
--n;

Now we can do the same thing with d_temp and n (find the greatest common divisor). However since we know a-priori that r * n / d is exact, then we also know that gcd(d_temp, n) == d_temp, and therefore we don't need to compute it. So we can divide n by d_temp:

unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
// now one can divide n by d/g without truncation
unsigned long long t = n / d_temp;
r = r * t;
--n;

Cleaning up:

unsigned long long
gcd(unsigned long long x, unsigned long long y)
{
    while (y != 0)
    {
        unsigned long long t = x % y;
        x = y;
        y = t;
    }
    return x;
}

unsigned long long
choose(unsigned long long n, unsigned long long k)
{
    if (k > n)
        throw std::invalid_argument("invalid argument in choose");
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d, --n)
    {
        unsigned long long g = gcd(r, d);
        r /= g;
        unsigned long long t = n / (d / g);
        if (r > std::numeric_limits<unsigned long long>::max() / t)
           throw std::overflow_error("overflow in choose");
        r *= t;
    }
    return r;
}

Now you can compute choose(67, 33) without overflow. And if you try choose(68, 33), you'll get an exception instead of a wrong answer.

The following routine will compute the n-choose-k, using the recursive definition and memoization. The routine is extremely fast and accurate:

inline unsigned long long n_choose_k(const unsigned long long& n,
                                     const unsigned long long& k)
{
   if (n  < k) return 0;
   if (0 == n) return 0;
   if (0 == k) return 1;
   if (n == k) return 1;
   if (1 == k) return n;       
   typedef unsigned long long value_type;
   value_type* table = new value_type[static_cast<std::size_t>(n * n)];
   std::fill_n(table,n * n,0);
   class n_choose_k_impl
   {
   public:

      n_choose_k_impl(value_type* table,const value_type& dimension)
      : table_(table),
        dimension_(dimension)
      {}

      inline value_type& lookup(const value_type& n, const value_type& k)
      {
         return table_[dimension_ * n + k];
      }

      inline value_type compute(const value_type& n, const value_type& k)
      {
         if ((0 == k) || (k == n))
            return 1;
         value_type v1 = lookup(n - 1,k - 1);
         if (0 == v1)
            v1 = lookup(n - 1,k - 1) = compute(n - 1,k - 1);
         value_type v2 = lookup(n - 1,k);
         if (0 == v2)
            v2 = lookup(n - 1,k) = compute(n - 1,k);
         return v1 + v2;
      }

      value_type* table_;
      value_type dimension_;
   };
   value_type result = n_choose_k_impl(table,n).compute(n,k);
   delete [] table;
   return result;
}

Remember that

n! / ( n - r )! = n * ( n - 1) * .. * (n - r + 1 )

so it's way smaller than n!. So the solution is to evaluate n* ( n - 1 ) * ... * ( n - r + 1) instead of first calculating n! and then dividing it .

Of course it all depends on the relative magnitude of n and r - if r is relatively big compared to n, then it still won't fit.

Well, I have to answer to my own question. I was reading about Pascal's triangle and by accident noticed that we can calculate the amount of combinations with it:

#include <iostream>
#include <boost/cstdint.hpp>

boost::uint64_t Combinations(unsigned int n, unsigned int r)
{
    if (r > n)
        return 0;

    /** We can use Pascal's triange to determine the amount
      * of combinations. To calculate a single line:
      *
      * v(r) = (n - r) / r
      *
      * Since the triangle is symmetrical, we only need to calculate
      * until r -column.
      */

    boost::uint64_t v = n--;

    for (unsigned int i = 2; i < r + 1; ++i, --n)
        v = v * n / i;

    return v;
}

int main()
{
    std::cout << Combinations(52, 5) << std::endl;
}

Getting the prime factorization of the binomial coefficient is probably the most efficient way to calculate it, especially if multiplication is expensive. This is certainly true of the related problem of calculating factorial (see Click here for example).

Here is a simple algorithm based on the Sieve of Eratosthenes that calculates the prime factorization. The idea is basically to go through the primes as you find them using the sieve, but then also to calculate how many of their multiples fall in the ranges [1, k] and [n-k+1,n]. The Sieve is essentially an O(n \log \log n) algorithm, but there is no multiplication done. The actual number of multiplications necessary once the prime factorization is found is at worst O\left(\frac{n \log \log n}{\log n}\right) and there are probably faster ways than that.

prime_factors = []

n = 20
k = 10

composite = [True] * 2 + [False] * n

for p in xrange(n + 1):
if composite[p]:
    continue

q = p
m = 1
total_prime_power = 0
prime_power = [0] * (n + 1)

while True:

    prime_power[q] = prime_power[m] + 1
    r = q

    if q <= k:
        total_prime_power -= prime_power[q]

    if q > n - k:
        total_prime_power += prime_power[q]

    m += 1
    q += p

    if q > n:
        break

    composite[q] = True

prime_factors.append([p, total_prime_power])

 print prime_factors

Simplify the formula first though. You don't wanna do long division.

One of SHORTEST way :

int nChoosek(int n, int k){
    if (k > n) return 0;
    if (k == 0) return 1;
    return nChoosek(n - 1, k) + nChoosek(n - 1, k - 1);
}

Using a dirty trick with a long double, it is possible to get the same accuracy as Howard Hinnant (and probably more):

unsigned long long n_choose_k(int n, int k)
{
    long double f = n;
    for (int i = 1; i<k+1; i++)
        f /= i;
    for (int i=1; i<k; i++)
        f *= n - i;

    unsigned long long f_2 = std::round(f);

    return f_2;
}

The idea is to divide first by k! and then to multiply by n(n-1)...(n-k+1). The approximation through the double can be avoided by inverting the order of the for loop.

If you want to be 100% sure that no overflows occur so long as the final result is within the numeric limit, you can sum up Pascal's Triangle row-by-row:

for (int i=0; i<n; i++) {
    for (int j=0; j<=i; j++) {
        if (j == 0) current_row[j] = 1;
        else current_row[j] = prev_row[j] + prev_row[j-1];
    }
    prev_row = current_row; // assume they are vectors
}
// result is now in current_row[r-1]

However, this algorithm is much slower than the multiplication one. So perhaps you could use multiplication to generate all the cases you know that are 'safe' and then use addition from there. (.. or you could just use a BigInt library).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow