Question

I wrote a code that numerically uses Legendre polynomials up to some high n-th order. For example:

....
case 8 
p = (6435*x.^8-12012*x.^6+6930*x.^4-1260*x.^2+35)/128; return
case 9 
...

If the vectorx is long this can become slow. I saw that there is a performance difference between say x.^4 and x.*x.*x.*x and thought I could use this to improve my code. I've used timeit and found that for:

x=linspace(0,10,1e6);
f1= @() power(x,4)
f2= @() x.4;
f3= @() x.^2.^2
f4= @() x.*x.*x.*x

f4 is faster by a factor 2 than the rest. However when I go to x.^6 there is very little difference between (x.*x.*x).^2 and x.*x.*x.*x.*x.*x (while all other options are slower).

Is there away to tell what will be the most efficient way to take a power of a vector? Can you explain why there is such a big difference in performance?

Was it helpful?

Solution

This is not exactly an answer to your question, but it may solve your problem:

x2 = x.*x; % or x.^2 or power(x,2), whichever is most efficient
p = ((((6435*x2-12012)*x2+6930)*x2-1260)*x2+35)/128

This way you do the power just once, and only with exponent 2. This trick can be applied to all Legendre polynomials (in the odd-degree polynomials one x2 is replaced by x).

OTHER TIPS

Here are some thoughts:

power(x,4) and x.^4 are equivalent (just read the doc).

x.*x.*x.*x is probably optimized to something like x.^2.^2


x.^2.^2 is probably evaluated as: Take the square of each element (fast), and take the square of that again (fast again).

x.^4 is probably directly evaluated as: Take the fourth power of each element (slow).

It is not so strange to see that 2 fast operations take less time than 1 slow operation. Just too bad that the optimization is not performed in the power 4 case, but perhaps it won't always work or come at a cost (input checking, memory?).


About the timings: Actually there is much more difference than a factor 2!

As you call them in a function now, the function overhead is added in each case, making the relative differences smaller:

y=x;tic,power(x,4);toc
y=x;tic,x.^4;toc
y=x;tic,x.^2.^2;toc
y=x;tic,x.*x.*x.*x;toc

will give:

Elapsed time is 0.034826 seconds.
Elapsed time is 0.029186 seconds.
Elapsed time is 0.003891 seconds.
Elapsed time is 0.003840 seconds.

So, it is nearly a factor 10 difference. However, note that the time difference in seconds is still minor, so for most practical applications I would just go for the simple syntax.

It seems as though Mathworks has special cased squares in its power function (unfortunately, it's all builtin closed source that we cannot see). In my testing on R2013b, it appears as though .^, power, and realpow use the same algorithm. For squares, I believe they have special-cased it to be x.*x.

1.0x (4.4ms):   @()x.^2
1.0x (4.4ms):   @()power(x,2)
1.0x (4.5ms):   @()x.*x
1.0x (4.5ms):   @()realpow(x,2)
6.1x (27.1ms):  @()exp(2*log(x))

For cubes, the story is different. They're no longer special-cased. Again, .^, power, and realpow all are similar, but much slower this time:

1.0x (4.5ms):   @()x.*x.*x
1.0x (4.6ms):   @()x.*x.^2
5.9x (26.9ms):  @()exp(3*log(x))
13.8x (62.3ms): @()power(x,3)
14.0x (63.2ms): @()x.^3
14.1x (63.7ms): @()realpow(x,3)

Let's jump up to the 16th power to see how these algorithms scale:

1.0x (8.1ms):   @()x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x
2.2x (17.4ms):  @()x.^2.^2.^2.^2
3.5x (27.9ms):  @()exp(16*log(x))
7.9x (63.8ms):  @()power(x,16)
7.9x (63.9ms):  @()realpow(x,16)
8.3x (66.9ms):  @()x.^16

So: .^, power and realpow all run in a constant time with regards to the exponent, unless it was special cased (-1 also appears to have been special cased). Using the exp(n*log(x)) trick is also constant time with regards to the exponent, and faster. The only result I don't quite understand why the repeated squaring is slower than the multiplication.

As expected, increasing the size of x by a factor of 100 increases the time similarly for all algorithms.

So, moral of the story? When using scalar integer exponents, always do the multiplication yourself. There's a whole lot of smarts in power and friends (exponent can be floating point, vector, etc). The only exceptions are where Mathworks has done the optimization for you. In 2013b, it seems to be x^2 and x^(-1). Hopefully they'll add more as time goes on. But, in general, exponentiation is hard and multiplication is easy. In performance sensitive code, I don't think you can go wrong by always typing x.*x.*x.*x. (Of course, in your case, follow Luis` advice and make use of the intermediate results within each term!)

function powerTest(x)

f{1} = @() x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x.*x;
f{2} = @() x.^2.^2.^2.^2;
f{3} = @() exp(16.*log(x));
f{4} = @() x.^16;
f{5} = @() power(x,16);
f{6} = @() realpow(x,16);

for i = 1:length(f)
    t(i) = timeit(f{i});
end

[t,idxs] = sort(t);
fcns = f(idxs);

for i = 1:length(fcns)
    fprintf('%.1fx (%.1fms):\t%s\n',t(i)/t(1),t(i)*1e3,func2str(fcns{i}));
end
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top