Problem representation
Using neural networks to represent the value-action function is a good idea. It has been shown that this works well for a number of applications. However, a more natural representation for the Q-function would be a net, that receives the combined state-action vector as input and has a scalar output. But as long as the number of actions is finite and small, it should be possible to do it like you did. Just remember that strictly speaking, you are not learning Q(s,a) but multiple value functions V(s) (one for each action), that share the same weights, except for the last layer.
Testing
This is a straight-forward greedy exploitation of the Q function. Should be correct.
Learning
There are several pitfalls here, you will have to think about. The first one is scaling. For neural network learning you really need to scale the inputs to the same range. If you use a sigmoidal activation function in the output layer you might also have to scale the target values.
Data efficiency is another thing to think about. You can do multiple updates of the net with each transition sample. Learning will be faster, but you would have to store each transition sample in memory.
Online vs. batch: If you store your sample you can do batch learning and avoid the problem that recent samples destroy already learned parts of the problem.
Literature
You should have a look at