What is the fastest way to do integer division?

https://stackoverflow.com/questions/10183872

01-06-2021
|

Domanda

Using scheme I have the need to use the following function. (All args are natural numbers [0, inf) )

(define safe-div
  (lambda (num denom safe)
    (if (zero? denom)
        safe
        (div num denom))))

However, this function is called quite often and is not performing well enough (speed wise). Is there a more efficient way of implementing the desired behavior (integer division of num and denom, return safe value if denom is zero)?

Notes, I am using Chez Scheme, however this is being used in a library that imports rnrs only, not full Chez.

Soluzione

For maximum performance, you need to get as close to the silicon as possible. Adding safety checks like this isn't going to do it, unless they get just-in-time compiled into ultra-efficient machine code by the scheme system.

I see two options. One is to create a native (i.e. foreign) implementation in C (or assembly) and invoke that. That might not be compatible with packaging it as a lambda, but then again, the dynamic nature of lambdas leads to notational efficiency but not necessarily runtime efficiency. (Function pointers excepted, there's a reason lambda expressions are not present in C, despite being many years older.) If you go this route, it might be best to take a step back and see if the larger processing of which safe-div is a part should be taken native. There's little point in speeding up the division at the center of a loop if everything around it is still slow.

Assuming that division by zero is expected to be rare, another approach is to just use div and hope its implementation is fast. Yes, this can lead to division by zero, but when it comes to speed, sometimes it is better to beg forgiveness than to ask permission. In other words, skip the checking before the division and just do it. If it fails, the scheme runtime should catch the division by zero fault and you can install an exception handler for it. This leads to slower code in the exceptional case and faster code in the normal case. Hopefully this tradeoff works out to a performance win.

Lastly, depending on what you are dividing by, it might be faster to multiply by the reciprocal than to perform an actual division. This requires fast reciprocal computation or revising earlier computations to yield a reciprocal directly. Since you are dealing with integers, the reciprocal would be stored in fixed-point, which is essentially 2^32 * 1/denom. Multiply this by num and shift right by 32 bits to get the quotient. This works out to a win because more processors these days have single cycle multiply instructions, but division executes in a loop on the chip, which is much slower. This might be overkill for your needs, but could be useful at some point.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow