I need to calculate Stirling's approximation very fast

https://stackoverflow.com/questions/21717767

10-10-2022
|

سؤال

I'm writing a small library for statistical sampling which needs to run as fast as possible. In profiling I discovered that around 40% of the time taken in the function is spent computing Stirling's approximation for the logarithm of the factorial. I'm focusing my optimization efforts on that piece of it. Here's my code (which uses MPFR):

const double AL[8] =
{ 0.0, 0.0, 0.6931471806, 1.791759469, 3.178053830, 4.787491743,
    6.579251212, 8.525161361 };
void HGD::mpfr_afc(mpfr_t &ret, const mpfr_t &val){

    if(mpfr_cmp_ui(val, 7) <= 0){
        mpfr_set_d(ret, AL[mpfr_get_ui(val, MPFR_RNDN)], MPFR_RNDN);
    }else{
        mpfr_set(ret, val, MPFR_RNDN);
        mpfr_add_d(ret, ret, 0.5, MPFR_RNDN);
        mpfr_log(LV, val, MPFR_RNDN);
        mpfr_mul(ret, LV, ret, MPFR_RNDN);
        mpfr_sub(ret, ret, val, MPFR_RNDN);
        mpfr_add_d(ret, ret, 0.399089934, MPFR_RNDN);
    }
}

I have a couple different ideas:

Precompute more than the first 8 inputs to the function.
Optimize the math (use a coarser approximation for smaller precision)
Use multiple threads to compute on different inputs in parallel
Switch to native arithmetic when numbers can fit in machine data types

Are there other approaches I could take?

المحلول

Switch to native arithmetic when numbers can fit in machine data types

That would be my first attempt. MPFR is likely to be a performance killer.

It seems to me you want to compute the logarithm of n! which you are already approximating with Stirling's formula.

Note that n!=Gamma(n+1). There are (seemingly) highly optimized functions to compute both the Gamma function and its logarithm. For example:

I would roll my own coarser approximation only if all the above fails performance-wise.

نصائح أخرى

A couple of thoughts here. First, it occurs to me that using MPFR for this may be overkill. Any of the multi precision libraries have enormous overhead. Not just a lot of overhead, but enormous overhead. Second thought is that maybe you don't need to use the multi precision log function. Maybe you could get away with the standard log?

If you can't fit your computations in a double precision float, then parallelizing using threads or other methods will certainly help. You could try playing with compiler optimizations, but I've not seen real improvements trying that.

Last option you could try is manually allocating memory space so that MPFR has a fixed overhead. I've never tried it, so I don't know if it would help.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow