Numpy dot() and array casting performace optimization

https://stackoverflow.com/questions/16499636

21-04-2022
|

Question

I wonder if it is possible somehow optimize dotproduts and array casting in this part of code in numpy, which according to profiler takes 95% of runtime of my code. (I don't want to use f2py, cython or pyOpenCl, I'm just learning how to use numpy effectively )

def evalSeriesInBasi(a,B):
    Y   = dot(a,B[0])
    dY  = dot(a,B[1])
    ddY = dot(a,B[2]) 
    return array([Y,dY,ddY])

def evalPolarForces( R, O ):
    # numexpr doest seem to help it takes 3,644 vs. 1.910 with pure numpy
    G    = 1.0 / (R[0]**2)                     # Gravitational force
    F_O  = R[0] * O[2]     + 2 * R[1] * O[1]   # Angular Kinematic Force = Angular engine thrust 
    F_R  = R[0] * O[1]**2  +     R[2]          
    FTR  = F_R - G                             
    FT2  = F_O**2 + FTR**2                     # Square of Total engine Trust Force ( corespons to propelant consuption for power limited variable specific impulse engine)
    return array([F_O,F_R,G,FTR, FT2]) 

def evalTrajectoryPolar( Rt0, Ot0, Bs, Rc, Oc ):
    Rt = Rt0 +  evalSeriesInBasi(Rc,Bs)
    Ot = Ot0 +  evalSeriesInBasi(Oc,Bs)
    Ft = evalPolarForces( Rt, Ot )
    return Ot, Rt, Ft

where "B" is array of shape (3,32,128) where basis functions are stored, "a" are coefficients of these basis functions and all other arrays like Y,dY,ddY, F_O,F_R,G,FTR, FT2 are values of some function at 128 sampling points

according to profiler the most time takes numpy.core.multiarray.array and numpy.core._dotblas.dot

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
22970    2.969    0.000    2.969    0.000 {numpy.core.multiarray.array}
46573    0.926    0.000    0.926    0.000 {numpy.core._dotblas.dot}
 7656    0.714    0.000    2.027    0.000 basiset.py:61(evalPolarForces)
 7656    0.224    0.000    0.273    0.000 OrbitalOptCos_numpyOpt.py:43(fitnesFunc)
 7656    0.192    0.000    4.868    0.001 basiset.py:54(evalTrajectoryPolar)
  116    0.141    0.001    5.352    0.046 optimize.py:536(approx_fprime)
 7656    0.132    0.000    5.273    0.001 OrbitalOptCos_numpyOpt.py:63(evalFitness)
15312    0.101    0.000    2.649    0.000 basiset.py:28(evalSeriesInBasi)

Solution

You can speedup the calculation by removing array() calls, here is an example:

import numpy as np

B = np.random.rand(3, 32, 128)
a = np.random.rand(32, 32)

def f1(a, B):
    Y   = dot(a,B[0])
    dY  = dot(a,B[1])
    ddY = dot(a,B[2]) 
    return array([Y,dY,ddY])

def f2(a, B):
    result = np.empty((B.shape[0], a.shape[0], B.shape[-1]))
    for i in xrange(B.shape[0]):
        np.dot(a, B[i], result[i])
    return result

r1 = f1(a, B)
r2 = f2(a, B)
print np.allclose(r1, r2)

The result of f1() and f2() is the same, but the speed if different:

%timeit f1(a, B)
%timeit f2(a, B)

Result is:

1000 loops, best of 3: 1.34 ms per loop
10000 loops, best of 3: 135 µs per loop

OTHER TIPS

How about:

def f3(a, B):
    r = np.dot(a, B)
    return np.rollaxis(r, 1)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow