Theano integration with already existing python code

Question 1

You don't need to change all your script to use Theano. You can re implement just part of your code with Theano and it will work. Theano take numpy ndarray as input and return numpy ndarray as output by default. So the integration is easy with numpy.

Theano don't implement einsum. So I would recommend you to start by replacing that with call to dot as Patrick said. I have see and hear many times that einsum is slower then call to dot in some cases. If that speed up isn't good enough, Theano can help you. If you just move the dot call to Theano, Theano won't be faster if Theano and NumPy are linked to the same BLAS library. But Theano should make you code faster if you move more computation to it. But it isn't perferct and some case don't have speed up and rare cases have slowdown compared to NumPy(mostly when the input shapes aren't big enough like with scalar)

About Patrick answer, you don't need to use the symbolic gradient of Theano to benefit from Theano. But if you want to use the symbolic gradient, Theano can only compute the symbolic gradient inside computation graph done in Theano. So you will need to convert that part completely to Theano. But as your code already work, it mean you have manually implement the grad. This is fine and don't cause any problem. You can move that manual implementation of the grad to Theano without using the symbolic gradient.

About Patrick comment on GPU. Don't forget that the transfer CPU/GPU of data is the most costly operation on the GPU. It can completely cancel the GPU speed up in many cases. So it isn't sure that doing only the dot on the GPU will help. In Theano, we put the weight on the GPU and without doing that I don't think you can get speed up from the GPU (which ever gnumpy/Theano/something else). The cases when doing only the DOT on the GPU will still give speed up is with gigantic matrices.

Question 2

If you have code running in Numpy, translating it to Theano is not going to make it magically faster, and it's going to take a significant coding effort.

Theano is really nice, but choosing it is more of a design-time decision - you wanna have all the niceties of symbolic differentiation so you don't have to calculate your gradients by hand for backpropr, so you use that framework.

If you already have the Numpy code ready, just optimize the bottleneck, which in your case must be that dot product. I would try using the dot function instead of einsum, i.e. instead of:

dw = np.einsum('ij,ik->jk', layerValues[layer - 1], deDz)

Try:

dw = layerValues[layer-1].T.dot(deDz)

Sometimes einsum is dumb so that might be faster. If not, consider either

using the GPU for the matrix-vector multiplication, via gnumpy or
using a better algorithm that'll converge faster than plain old gradient descent (which I assume is what you're using): adagrad or a second-order optimization algorithm - lbfgs, say -