Pergunta

I have an m x n array: a, where the integers m > 1E6, and n <= 5.

I have functions F and G, which are composed like this: F( u, G ( u, t)). u is a 1 x n array, t is a scalar, and F and G returns 1 x n arrays.

I need to evaluate each row of a in F, and use previously evaluated row as the u-array for the next evaluation. I need to make m such evaluations.

This has to be really fast. I was previously impressed by scitools.std StringFunction evaluaion for a whole array, but this problem requires using the previously calculated array as an argument in calculating the next. I don't know if StringFunction can do this.

For example:

a = zeros((1000000, 4))
a[0] = asarray([1.,69.,3.,4.1])

# A is a float defined elsewhere, h is a function which accepts a float as its argument and returns an arbitrary float. h is defined elsewhere.

def G(u, t):
  return asarray([u[0], u[1]*A, cos(u[2]), t*h(u[3])])

def F(u, t):
  return u + G(u, t)


dt = 1E-6

for i in range(1, 1000000):
  a[i] = F(a[i-1], i*dt)
  i += 1

The problem with the above code is that it is slow as hell. I need to get these calculations done by numpy milliseconds.

How can I do what I want?

Thank you for our time.

Kind regards,

Marius

Foi útil?

Solução

This sort of thing is very difficult to do in numpy. If we look at this by column we see a few simpler solutions.

a[:,0] is very easy:

col0 = np.ones((1000))*2
col0[0] = 1                  #Or whatever start value.
np.cumprod(col0, out=col0)

np.allclose(col0, a[:1000,0])
True

As mentioned earlier this will overflow very quickly. a[:,1] can be done much along the same lines.

I do not believe there is a way to do the next two columns inside numpy alone quickly. We can turn to numba for this:

from numba import auotojit

def python_loop(start, count):
     out = np.zeros((count), dtype=np.double)
     out[0] = start
     for x in xrange(count-1):
         out[x+1] = out[x] + np.cos(out[x+1])
     return out

numba_loop = autojit(python_loop)

np.allclose(numba_loop(3,1000),a[:1000,2])
True

%timeit python_loop(3,1000000)
1 loops, best of 3: 4.14 s per loop

%timeit numba_loop(3,1000000)
1 loops, best of 3: 42.5 ms per loop

Although its worth pointing out that this converges to pi/2 very very quickly and there is little point in calculating this recursion past ~20 values for any start value. This returns the exact same answer to double point precision- I didn't bother finding the cutoff, but it is much less then 50:

%timeit tmp = np.empty((1000000)); 
        tmp[:50] = numba_loop(3,50);
        tmp[50:] = np.pi/2
100 loops, best of 3: 2.25 ms per loop

You can do something similar with the fourth column. Of course you can autojit all of the functions, but this gives you several different options to try out depending on numba usage:

  1. Use cumprod for the first two columns
  2. Use an approximation for column 3 (and possible 4) where only the first few iterations are calculated
  3. Implement columns 3 and 4 in numba using autojit
  4. Wrap everything inside of an autojit loop (the best option)
  5. The way you have presented this all rows past ~200 will either be np.inf or np.pi/2. Exploit this.

Outras dicas

Slightly faster. Your first column is basicly 2^n. Calculating 2^n for n up to 1000000 is gonna overflow.. second column is even worse.

def calc(arr, t0=1E-6):
    u = arr[0]
    dt = 1E-6
    h = lambda x: np.random.random(1)*50.0

    def firstColGen(uStart):
        u = uStart
        while True:
            u += u
            yield u

    def secondColGen(uStart, A):
        u = uStart
        while True:
            u += u*A
            yield u

    def thirdColGen(uStart):
        u = uStart
        while True:
            u += np.cos(u)
            yield u

    def fourthColGen(uStart, h, t0, dt):
        u = uStart
        t = t0
        while True:
            u += h(u) * dt
            t += dt
            yield u

    first = firstColGen(u[0])
    second = secondColGen(u[1], A)
    third = thirdColGen(u[2])
    fourth = fourthColGen(u[3], h, t0, dt)

    for i in xrange(1, len(arr)):
        arr[i] = [first.next(), second.next(), third.next(), fourth.next()]
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top