From a friend of mine, I heard that the pow function is slower than its equivalent in simply multiplying the base by itself, the amount of times as its exponent. For example, according to him,

#include <stdio.h>
#include <math.h>

int main () {
    double e = 2.71828
    e2 = pow (e, 2.0)
    printf("%le", e2)
}

is slower than

#include <stdio.h>

int main() {
    double e = 2.71828
    e2 = e * e
    printf("%le", e2)
}

As a novice, I would think they both compile at the same speed, and by the same logic, I would prefer the former for its typical pithiness. So, why is the former block of code slower than the latter one?

有帮助吗?

解决方案

pow(double,double) needs to handle raising to any power, not just an integer based power, or especially 2. As such, it's far more complicated than just doing a simple multiplication of two double values.

其他提示

Because the pow function must implement a more generic algorithm that has to work on all the cases (in particular, it must be able to elevate to any rational exponent representable by a double), while e*e is just a simple multiplication that will boil down to one or two assembly instructions.

Still, if the compiler is smart enough, it may automatically replace your pow(e, 2.0) with e*e automatically anyway (well, actually in your case it will probably just perform the whole computation at compile time).


Just for fun, I ran some tests: compiling the following code

#include <math.h>

double pow2(double value)
{
    return pow(value, 2.);
}

double knownpow2()
{
    double e=2.71828;
    return pow(e, 2.);
}

double valuexvalue(double value)
{
    return value*value;
}

double knownvaluexvalue()
{
    double e=2.71828;
    return e*e;
}

with g++ -O3 -c pow.c (g++ 4.7.3) and disassembling the output with objdump -d -M intel pow.o I get:

0000000000000000 <_Z4pow2d>:
   0:   f2 0f 59 c0             mulsd  xmm0,xmm0
   4:   c3                      ret    
   5:   66 66 2e 0f 1f 84 00    data32 nop WORD PTR cs:[rax+rax*1+0x0]
   c:   00 00 00 00 

0000000000000010 <_Z9knownpow2v>:
  10:   f2 0f 10 05 00 00 00    movsd  xmm0,QWORD PTR [rip+0x0]        # 18 <_Z9knownpow2v+0x8>
  17:   00 
  18:   c3                      ret    
  19:   0f 1f 80 00 00 00 00    nop    DWORD PTR [rax+0x0]

0000000000000020 <_Z11valuexvalued>:
  20:   f2 0f 59 c0             mulsd  xmm0,xmm0
  24:   c3                      ret    
  25:   66 66 2e 0f 1f 84 00    data32 nop WORD PTR cs:[rax+rax*1+0x0]
  2c:   00 00 00 00 

0000000000000030 <_Z16knownvaluexvaluev>:
  30:   f2 0f 10 05 00 00 00    movsd  xmm0,QWORD PTR [rip+0x0]        # 38 <_Z16knownvaluexvaluev+0x8>
  37:   00 
  38:   c3                      ret    

So, where the compiler already knew all the values involved it just performed the computation at compile-time; and for both pow2 and valuexvalue it emitted a single mulsd xmm0,xmm0 (i.e. in both cases it boils down to the multiplication of the value with itself in a single assembly instruction).

Here is one (simple, heed the comment) pow implementation. In being generic it involves a number of branches a potential division and calls to exp, log, modf ..

On the other hand, on the multiplication is a single instruction (give or take) on most higher CPUs.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top