Really, you just need to implement an ANN which uses the basic, usual sum of squares error. Then, replace the target network outputs with the TD-error value: E = r + gamma*V(t+1) - V(t)
From there, you can just use the typical ANN backprop weight update rule.
So, in short, I think your description is actually what a RL via ANN algorithm should do. It is training the ANN to learn the state/action value function.