Frage

To speed up my code I am converting a multidimensional sumproduct function from Python to Theano. My Theano code reaches the same result, but only calculates the result for one dimension at a time, so that I have to use a Python for-loop to get the end result. I assume that would make the code slow, because Theano cannot optimize memory usage and transfer (for the gpu) between multiple function calls. Or is this a wrong assumption?

So how can I change the Theano code, so that the sumprod is calculated in one function call?

The original Python function:

def sumprod(a1, a2):
    """Sum the element-wise products of the `a1` and `a2`."""
    result = numpy.zeros_like(a1[0])
    for i, j in zip(a1, a2):
        result += i*j
    return result

For the following input

a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])

the output would be: [ 26. 40. 65.] that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7

The Theano version of the code:

import theano
import theano.tensor as T
import numpy

a1 = ([1, 2, 4], [5, 6, 7])
a2 = ([1, 2, 4], [5, 6, 7])

# wanted result:  [ 26.  40.  65.]
# that is 1*1 + 5*5, 2*2 + 6*6 and 4*4 + 7*7

Tk = T.iscalar('Tk')
Ta1_shared = theano.shared(numpy.array(a1).T)
Ta2_shared = theano.shared(numpy.array(a2).T)

outputs_info = T.as_tensor_variable(numpy.asarray(0, 'float64'))

Tsumprod_result, updates = theano.scan(fn=lambda Ta1_shared, Ta2_shared, prior_value: 
                                       prior_value + Ta1_shared * Ta2_shared,
                                       outputs_info=outputs_info,
                                       sequences=[Ta1_shared[Tk], Ta2_shared[Tk]])
Tsumprod_result = Tsumprod_result[-1]

Tsumprod = theano.function([Tk], outputs=Tsumprod_result)

result = numpy.zeros_like(a1[0])
for i in range(len(a1[0])):
    result[i] = Tsumprod(i)
print result
War es hilfreich?

Lösung

First, there is more people that will answer your questions on theano mailing list then on stackoverflow. But I'm here:)

First, your function isn't a good fit for GPU. Even if everything was well optimized, the transfer of the input to the gpu just to add and sum the result will take more time to run then the python version.

Your python code is slow, here is a version that should be faster:

def sumprod(a1, a2):
    """Sum the element-wise products of the `a1` and `a2`."""
    a1 = numpy.asarray(a1)
    a2 = numpy.asarray(a2)
    result (a1 * a2).sum(axis=0)
    return result

For the theano code, here is the equivalent of this faster python version(no need of scan)

m1 = theano.tensor.matrix()
m2 = theano.tensor.matrix()
f = theano.function([m1, m2], (m1 * m2).sum(axis=0))

The think to remember from this is that you need to "vectorize" your code. The "vectorize" is used in the NumPy context and it mean to use numpy.ndarray and use function that work on the full tensor at a time. This is always faster then doing it with loop (python loop or theano scan). Also, Theano optimize some of thoses cases by moving the computation outside the scan, but it don't always do it.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top