I would like to offer 2 alternative solutions to unutbu's answer. What he writes is The Right Thing™ to do if you don't want to depend on Cython or on a JIT compiler and it is appropriate to generate the output St
in batch.
I grabbed generate_orig()
from his answer and turned the Python list St
into a numpy array:
import numpy as np
def generate_orig(T=1., nt=100000, lbd=500., mu=0, sigma=1., S0=0):
dt = T/nt
St = np.full(nt, fill_value=S0, dtype=np.float64)
sqrtdt = np.sqrt(dt)
dBt = np.random.normal(0, sqrtdt, nt)
for k in xrange(1, nt):
dSt = lbd * (mu - St[k-1]) * dt + sigma * dBt[k]
St[k] = St[k-1] + dSt
return St
Timings:
%timeit [generate_orig() for i in xrange(100)]
1 loops, best of 3: 25.4 s per loop
No improvements so far, same as before. However, with Numba, just by adding @autojit
:
import numpy as np
from numba import autojit
@autojit
def generate_orig(T=1., nt=100000, lbd=500., mu=0, sigma=1., S0=0):
# The rest is exactly the same as before
The timing drops:
%timeit [generate_orig(1., 100000, 500., 0, 1., 0) for i in xrange(100)]
1 loops, best of 3: 642 ms per loop
I think it's AWESOME! 40x speed-up just for adding @autojit
!
Cython
Here is a Cython version with typed memoryviews:
%%cython
# cython: infer_types=True
# cython: boundscheck=False
# cython: wraparound=False
import numpy as np
cimport numpy as np
def generate_cython(double T=1., int nt=100000, double lbd=500., double mu=0, double sigma=1., double S0=0):
cdef int k
cdef double dt, dSt
cdef double[:] vSt, vdBt
dt = T/nt
St = np.full(nt, fill_value=S0, dtype=np.float64)
vSt = St
vdBt = np.random.normal(0.0, np.sqrt(dt), nt)
for k in xrange(1, nt):
dSt = lbd * (mu - vSt[k-1]) * dt + sigma * vdBt[k]
vSt[k] = vSt[k-1] + dSt
return St
Timing:
%timeit [generate_cython(1., 100000, 500., 0, 1., 0) for i in xrange(100)]
1 loops, best of 3: 638 ms per loop
The code is exactly as fast as the Numba version (the tiny difference is just noise). However the code became ugly, all those type declarations make it clumsy. :( Well, not a disaster but still.
Both solutions give a 3x speed-up compared to unutbu's answer, which runs in 1.97 s on my machine. However, as I said at the beginning, his solution is the way to go if you don't want to depend on Cython or on Numba. (Both of them have drawbacks; it is understandable if someone wants to avoid such a dependence.)
What would happen if we applied either Numba or Cython to unutbu's solution? Would that result in an even faster code? No. With Numba, there is no difference. Cython makes things slightly worse. Well, perhaps a Cython guru could come up with a better solution...