Choosing hidden layers for backpropagation networks is a bit of a black magic but you can reason about it to some extent. As you know, by learning the network determines parameters of a plane in a high-dimensional space that can correctly classify your inputs. So you need to have a sufficiently high number of neurons to allow for discriminating between your different inputs. It used to be an active research topic when I studied neural networks more than 5 years ago. Perhaps have a look at this paper: An algebraic projection analysis for optimal hidden units size and learning rates in back-propagation learning
With other kinds of networks, such as recurrent networks, there are some techniques that can help with finding the right architecture - for example visualizing the learned weights which sometimes clearly resemble features of the inputs.