N Choose K Function Crashes Rcpp

https://stackoverflow.com//questions/25005216

20-12-2019
|

Question

I've written an 'n choose k' function in C++, which is interfacing with R via Rcpp. For some reason I am getting a 'divide by zero' runtime error. It happens when I try to evaluate 30 choose 2.

I've tried evaluating each line manually (with evalCpp), and I am still puzzled about where the divide by zero is happening. Maybe somebody could point this out to me or suggest a better way of writing n choose K?

Here is the code:

// [[Rcpp::export]]                                                                                                                                  
int chooseC(int n, int k) {                                                                                                                         
  if (k > n) {                                                                                                                                      
    std::cout << "Error. k cannot be greater than n." << std::endl;                                                                                 
    return 0;                                                                                                                                       
  }                                                                                                                                                 
  int factN = std::tgamma(n + 1);                                                                                                                   
  int factK = std::tgamma(k + 1);                                                                                                                   
  int factDiff = std::tgamma(n - k + 1);                                                                                                            
  return factN/(factK*factDiff);                                                                                                                    
}

Solution

Briefly:

There is no tgamma in std as far as I can see
R itself as a choose function so I would just do what is below
R also has the gamma distribution etc so you can do this by hand as well
Why didn't you just print the values factN, factK, factDiff ?

Simple Rcpp solution:

#include <Rcpp.h>

// [[Rcpp::export]]  
double chooseC(double n, double k) {
  return Rf_choose(n, k);
}

Example:

R> chooseC(5,2)     
[1] 10
R>

Edit: Following the comment by @Blastfurnace about tgamma() in the C++11 cmath header, here is a repaired version which works fine for me:

#include <Rcpp.h>
#include <cmath>

// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]] 
int chooseCtake2(int n, int k) {
  if (k > n) {
    Rcpp::stop("Error. k cannot be greater than n.");
  }
  int factN = std::tgamma(n + 1);
  int factK = std::tgamma(k + 1);
  int factDiff = std::tgamma(n - k + 1);
  return factN/(factK*factDiff); 
}

Example use:

R> sourceCpp("/tmp/chooseC.cpp")
R> chooseCtake2(2,3)
Error: Error. k cannot be greater than n.
R> chooseCtake2(5,2)
[1] 10
R>

OTHER TIPS

So std::tgamma(x) computes the gamma function of x. This function goes to infinity quite rapidly:

http://www.wolframalpha.com/share/clip?f=d41d8cd98f00b204e9800998ecf8427et5pmak8jtn

Already at x == 31, you have a very large number.

When converting this very large double back to int, the results are undefined behavior (4.9 Floating-integral conversions [conv.fpint]):

A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion trun- cates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

On my system this conversion (with an input of {30, 2}) results in an int with the value -2147483648. This is easily observed by inserting some print statements:

int
chooseC(int n, int k)
{
    if (k > n)
    {                                                                                                                                      
        std::cout << "Error. k cannot be greater than n.\n";
        return 0;                                                                                                                                       
    }                                                                                                                                                 
    int factN = std::tgamma(n + 1);
    std::cout << "factN = " << factN << '\n';
    int factK = std::tgamma(k + 1);
    std::cout << "factK = " << factK << '\n';
    int factDiff = std::tgamma(n - k + 1);
    std::cout << "factDiff = " << factDiff << '\n';
    std::cout << "factK*factDiff = " << factK*factDiff << '\n';
    return factN/(factK*factDiff); 
}

which for me outputs:

factN = -2147483648
factK = 2
factDiff = -2147483648
factK*factDiff = 0

As can be seen, the UB ultimately results in a divide by zero, which is also UB. And sounds very similar to the behavior you are seeing.

The solution to this problem is to compute things using only integral arithmetic, and in such a way that the intermediate computations do not overflow if the final result is representable in the integral type. This entails the use of a Greatest Common Divisor function.

Open source code which does this is available here:

http://howardhinnant.github.io/combinations.html

Search for "count_each_combination". Your chooseC can be coded in terms of count_each_combination like so:

int
chooseC(int n, int k)
{
    if (k > n)
    {                                                                                                                                      
        std::cout << "Error. k cannot be greater than n.\n";
        return 0;                                                                                                                                       
    }                                                                                                                                                 
    return count_each_combination(n-k, k);
}

Now chooseC(30, 2) will return 435. If count_each_combination is unable to store the result in an int, a std::overflow_error will be thrown.

If you would like to constrain your chooseC to k == 2, or perhaps do so just temporarily just to better understand the algorithm, note that the formula for counting combinations is:

enter image description here

When k == 2, this simplifies to:

n*(n-1)/2

Now either n is even, or n-1 is even. You can discover which, and then divide that number by 2, with no truncation error, and then multiply the result by the number which wasn't divided by 2. Thus you get the exact result with no possibility of truncation error, nor intermediate overflow, using only integral arithmetic. This is the technique which count_each_combination uses, but generalized to any divisor, to deliver a result that is always exact if it can fit into the supplied integral type.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow