Question

I am rewriting an analysis code for Molecular Dynamics time series. Due to the huge amount of time steps (150 000 for each simulation run) which have to be analysed, it is very important that my code is as fast as possible.

The old code is very slow (actually it requires 300 to 500 times more time compared to my one) because it was written for the analysis of a few thousand PDB files and not a bunch full of different simulations (around 60), each one having 150 000 time steps. I know that C or Fortran would be the swiss army knife in this case but my experience with c is .....

Therefore I am trying to use as much as possible numpy/scipy routines for my python code. Because I've a license for the accelerated distribution of anaconda with the mkl, this is a really significant speedup.

Now I am facing a problem and I hope that I can explain it in a manner that you understand what i mean.

I have three arrays each one with a shape of (n, 3, 20). In the first row are all residuals of my peptide, commonly around 23 to 31. In the second row are coordinates in the order xyz and in the third row are some specific time steps.

Now I'am calculating the torsion for each residual at each time step. my code for the case of arrays with shape (n,3,1) its:

def fast_torsion(d1, d2, d3):
    tt = dot(d1, np.cross(d2, d3))
    tb = dot(d1, d1) * dot(d2, d2)
    torsion = np.zeros([len(d1), 1])
    for i in xrange(len(d1)):
        if tb[i] != 0:
            torsion[i] = tt[i]/tb[i]
    return torsion

Now I tried to use the same code for the arrays with the extended third axis but the cross product function produces the wrong values compared to the original slow code, which is using a for loop. I tried this code with my the big arrays it is around 10 to 20 times faster than a for loop solution and around 200 times fast than the old code.

What I am trying is that np.cross() only computes the cross product over the second (xyz) axis and iterates over the other two axis. In the case with the short third axis it works fine, but with the big arrays it only works for the first time step. I also tried the axis settings but I had no chance.

I can also use Cython or numba if this is the only solution for my problem.

P.S. Sorry for my english I hope you can understand everything.

Was it helpful?

Solution

np.crosshas axisa, axisb and axisc keyword arguments to select where in the input and output arguments are the vectors to be cross multiplied. I think you want to use:

np.cross(d2, d3, axisa=1, axisb=1, axisc=1)

If you don't include the axisc=1, the result of the multiplication will be at the end of the output array.

Also, you can avoid explicitly looping over your torsion array by doing:

torsion = np.zeros((len(d1), 1)
idx = (tb !=0)
torsion[idx] = tt[idx] / tb[idx]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top