Is MPFR division faster than native integer division?

https://stackoverflow.com//questions/25024203

21-12-2019
|

質問

I've always made the assumption that integer division was faster than floating point division, but I did some tests that seemed to prove otherwise.

import gmpy2, time, math

digits = 100000

scale = 10**digits  # Decimal precision
gmpy2.get_context().precision = int(math.log2(10) * digits)  # Binary precision

def start_timer():
    global start_time  
    start_time = time.time()

def print_timer():
    print("%s s" % (time.time() - start_time))

start_timer()
for i in range(1000):
    x = scale // 3
print_timer()

start_timer()
for i in range(1000):
    x = gmpy2.mpfr(1) / 3
print_timer()

start_timer()
for i in range(1000):
    x = gmpy2.mpfr(1) / gmpy2.mpfr(3)
print_timer()

The integer division took 0.17 seconds, the mpfr division took 0.06 seconds, and the division between two floating point numbers took 15.56 seconds.

My questions:

Am I setting up this test correctly?
Is mpfr division really more optimized than native division?
Is division involving a floating point and an integer much faster than that involving two floating point numbers?

解決

I'm using IPython to time some short examples and then I'll try to explain the results.

from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000000 loops, best of 3: 669 ns per loop
%timeit a/3
1000000 loops, best of 3: 464 ns per loop

get_context().precision=10000
a=mpfr(1);b=mpfr(3)

%timeit a/b
100000 loops, best of 3: 12.9 µs per loop
%timeit a/3
1000000 loops, best of 3: 1.33 µs per loop

get_context().precision=100000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 8.13 µs per loop

Notice that as the precision increases, the running time for a/b increases more rapidly than a/3. When calculating a/b, MPFR uses the full precision of both values and the running time is (roughly) O(n * ln(n)). When calculating a/3, MPFR uses a short, but exact, representation of 3 and the running time is (roughly) O(n). This explains why a/b is slower than a/3 for high precision. (n is the length of a in bits.)

When Python calculates scale//3, it takes advantage of the fact that 3 will fit into a single digit and running time is linear in the length of scale. This is effectively the same calculation as a/3 but since the underlying GMP library is faster than Python, a/3 is computed faster than scale//3.

Here is a short example of the difference in performance between Python and GMP.

from gmpy2 import mpz
scale = 10**100000

%timeit scale//3
10000 loops, best of 3: 162 µs per loop

scale = mpz(scale)

%timeit scale//3
100000 loops, best of 3: 19 µs per loop

You were measuring the performance between an n by n division and an n by k division when you compared a/b and a/3. (n is the length of a in bits, and k is much, much smaller than n.) When you compared scale//3 and `a/3', you were comparing a simple, straight-forward division implementation with a highly-optimized implementation.

Implementation note: In the current unstable development branch, a/3 calls mpfr_div_ui directly. This eliminates the creation of a temporary object by MPFR. This improves the performance as shown below.

from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000000 loops, best of 3: 593 ns per loop
%timeit a/3
1000000 loops, best of 3: 231 ns per loop

get_context().precision=10000
a=mpfr(1); b=mpfr(3)

%timeit a/b
100000 loops, best of 3: 12.7 µs per loop
%timeit a/3
1000000 loops, best of 3: 927 ns per loop

get_context().precision=100000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 6.77 µs per loop

他のヒント

A note about the GNU MPFR implementation (I'm a MPFR developer, though I haven't really worked on the division): the selection of the best algorithm for multiplication and division is quite difficult, because there are various parameters (precisions of the inputs and output, and whether the inputs can be represented with a smaller precision because of trailing zeros, in particular) and some cases may be more difficult to round than others. Moreover algorithms thus timings may change from one release to another, improving some cases but making at the same time other cases slower. Even recently (two months ago), we had a discussion about whether doing a special recognition of constant powers of two for the integer in mpfr_mul_ui and mpfr_div_ui.

If you want to really compare integer division with MPFR FP division, you should do the comparison with GMP's integer division. MPFR is based on GMP's division, but not naively. The best way to know what MPFR is doing is to use MPFR logging (this may need a rebuild with --enable-logging) with the corresponding environment variables. Note that when logging is enabled in the MPFR build, even if logging is not used, MPFR may be a bit slower.

Floating-point division is typically faster than integer division on the CPU. One could speculate that this has to do with the FPU being more optimized for this operation, or that the floating-point representation makes division easier. But whatever the reasons won't change the fact. In the end, the only way to obtain concrete answers to your second and third questions is to test it. And yes, your tests looks OK to me.

If I had to venture a guess, I think that the case where you divide an MPFR number by an integer is faster because GMP can use its limited precision to its advantage when computing the division.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow