Resolved it. Apparently lower-dimensional networks are more likely to get stuck in a local minima. This is easy to grasp knowing that higher-dimensional networks are less likely to achieve any minima, even global.
Implementing momentum that increases with each iteration gets me through most of the minima. So, re-initializing weights to random (-0.5 to 0.5) values and conducting multiple training sessions eventually gets me through all of them.
I am happy to announce that my network now gets through training in 100% of cases if data is classifiable.