Frage

I have a simple question, for which I expected to already have found an easy answer online, but I did not.

I have a python project that works a lot with numpy, due to matrix operations. I wanted to speed the code up, so I did some profiling to find out the right tool for the job. Turns out the overhead is not python, but the matrix operations. Hence I thought I should use Theano (especially given the case that I am implementing machine learning algorithms, and this is what is was made for).

Most of the overhead of my project is in one function, and I was wondering it is somehow possible to just rewrite that function with theano and then get the numpy arrays out of it and continue the computation as usual.

This is again just to test how much speed up I will obtain without committing myself to changing a lot of code.

Thank you!

Edit: function in case is this one

def backprop(weights, layerValues, finalLayerErrors, activationFunctions):
  nrLayers = len(weights) + 1
  deDw = []
  deDbias = []
  upperLayerErrors = finalLayerErrors

  for layer in xrange(nrLayers - 1, 0, -1):
    deDz = activationFunctions[layer - 1].derivativeForLinearSum(
                        upperLayerErrors, layerValues[layer])
    upperLayerErrors = np.dot(deDz, weights[layer - 1].T)

    dw = np.einsum('ij,ik->jk', layerValues[layer - 1], deDz)

    dbias = deDz.sum(axis=0)

    # Iterating in decreasing order of layers, so we are required to
    # append the weight derivatives at the front as we go along
    deDw.insert(0, dw)
    deDbias.insert(0, dbias)

return deDw, deDbias
War es hilfreich?

Lösung

You don't need to change all your script to use Theano. You can re implement just part of your code with Theano and it will work. Theano take numpy ndarray as input and return numpy ndarray as output by default. So the integration is easy with numpy.

Theano don't implement einsum. So I would recommend you to start by replacing that with call to dot as Patrick said. I have see and hear many times that einsum is slower then call to dot in some cases. If that speed up isn't good enough, Theano can help you. If you just move the dot call to Theano, Theano won't be faster if Theano and NumPy are linked to the same BLAS library. But Theano should make you code faster if you move more computation to it. But it isn't perferct and some case don't have speed up and rare cases have slowdown compared to NumPy(mostly when the input shapes aren't big enough like with scalar)

About Patrick answer, you don't need to use the symbolic gradient of Theano to benefit from Theano. But if you want to use the symbolic gradient, Theano can only compute the symbolic gradient inside computation graph done in Theano. So you will need to convert that part completely to Theano. But as your code already work, it mean you have manually implement the grad. This is fine and don't cause any problem. You can move that manual implementation of the grad to Theano without using the symbolic gradient.

About Patrick comment on GPU. Don't forget that the transfer CPU/GPU of data is the most costly operation on the GPU. It can completely cancel the GPU speed up in many cases. So it isn't sure that doing only the dot on the GPU will help. In Theano, we put the weight on the GPU and without doing that I don't think you can get speed up from the GPU (which ever gnumpy/Theano/something else). The cases when doing only the DOT on the GPU will still give speed up is with gigantic matrices.

Andere Tipps

If you have code running in Numpy, translating it to Theano is not going to make it magically faster, and it's going to take a significant coding effort.

Theano is really nice, but choosing it is more of a design-time decision - you wanna have all the niceties of symbolic differentiation so you don't have to calculate your gradients by hand for backpropr, so you use that framework.

If you already have the Numpy code ready, just optimize the bottleneck, which in your case must be that dot product. I would try using the dot function instead of einsum, i.e. instead of:

dw = np.einsum('ij,ik->jk', layerValues[layer - 1], deDz)

Try:

dw = layerValues[layer-1].T.dot(deDz)

Sometimes einsum is dumb so that might be faster. If not, consider either

  1. using the GPU for the matrix-vector multiplication, via gnumpy or
  2. using a better algorithm that'll converge faster than plain old gradient descent (which I assume is what you're using): adagrad or a second-order optimization algorithm - lbfgs, say -
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top