Question

The number of combinations of k items which can be retrieved from N items is described by the following formula.

             N! 
c =  ___________________ 
       (k! * (N - k)!)

An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.

Optimize this formula to run with the smallest O time complexity

This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.

http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000

I'll post some info/links from that discussion after some people take a stab at the solution.

Any language is acceptable.

Was it helpful?

Solution

Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.

Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.

If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:

  • 3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
  • 2 multiplications
  • 1 divisions

The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.

EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.

EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.

OTHER TIPS

Python: O(min[k,n-k]2)

def choose(n,k):
    k = min(k,n-k)
    p = q = 1
    for i in xrange(k):
        p *= n - i
        q *= 1 + i
    return p/q

Analysis:

  • The size of p and q will increase linearly inside the loop, if n-i and 1+i can be considered to have constant size.
  • The cost of each multiplication will then also increase linearly.
  • This sum of all iterations becomes an arithmetic series over k.

My conclusion: O(k2)

If rewritten to use floating point numbers, the multiplications will be atomic operations, but we will lose a lot of precision. It even overflows for choose(20000000, 15000000). (Not a big surprise, since the result would be around 0.2119620413×104884378.)

def choose(n,k):
    k = min(k,n-k)
    result = 1.0
    for i in xrange(k):
        result *= 1.0 * (n - i) / (1 + i)
    return result

I'd solve it in Mathematica:

Binomial[n, k]

Man, that was easy...

Python: approximation in O(1) ?

Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).

from decimal import Decimal

ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()

pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')

# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z

def choose(n, k):
  n = Decimal(str(n))
  k = Decimal(str(k))
  return gamma(n+1)/gamma(k+1)/gamma(n-k+1)

Example:

>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')

Any higher, and it will overflow. The exponent seems to be limited to 40000000.

Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.

It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.

MATLAB:

  • The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)

    nchoosek(N,k)
    
  • My solution: 36 characters, O(min(k,N-k))

    a=min(k,N-k);
    prod(N-a+1:N)/prod(1:a)
    

I know this is a really old question but I struggled with a solution to this problem for a long while until I found a really simple one written in VB 6 and after porting it to C#, here is the result:

public int NChooseK(int n, int k)
{
    var result = 1;
    for (var i = 1; i <= k; i++)
    {
        result *= n - (k - i);
        result /= i;
    }

    return result;
}

The final code is so simple you won't believe it will work until you run it.

Also, the original article gives some nice explanation on how he reached the final algorithm.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top