Artificial Neural Network training with 6 features

Question 1

Your main function is fine. However either your training vectors or your backpropagation code is not (assuming your network is big enough to learn this). So this is going to be a bunch of question instead of an answer, but you may get the right idea:

How many samples does your training vector include?
Are those samples roughly classified half/half or is there a bias?
Are there identical training samples that are classified ambiguously?
How is the error calculated? Abs/Sqr average?
Do you randomize the initial network weights?
What is the initial error before training?
Does the error change in the first iteration?
Can you post the code on pastebin?

Question 2

Are you using batch learning or online learning? If the answer is batch, then maybe your learning rate is too high. You can try scaling it dividing for the number of training patterns. As @Marcom said, if you have too few neurons your network has too low capacity, that's a bit rough to explain but basically you aren't using the non-linear region of the neurons and your network is biased.

Check here for a better explanation.

Try with a huge number of neurons first, then you can decrease the number as long as the error keeps going down.

Question 3

Try experimenting with adding an additional hidden layer and also try increasing the number of hidden nodes. I can't give you a technical explanation off my head but if you have too few nodes the ann might not be able to converge.

Question 4

A loss function not evolving at the start in an MLP is usually because the network can't infer any rules to fit your training data (the grad of your backprop can't find any meaningful local minima) . This can be caused by a lack of data for the problem you try to resolve, or a restricted architecture.

Increasing your number of layers and/or the size of them should change that. Although you will be prone to overfitting if your architecture is too complex. You will have to find a balance fitting to your problem.

And don't hesitate to start with a low learning rate at first, setting it too high will cause your gradient to "bounce" and not converge.