Roughly speaking:
more linear problem => less hidden nodes, more non-linear => more hidden nodes.
more generalisation => less hidden nodes, less generalisation => more hidden nodes
accurate answer (at least for your training set) => more hidden nodes, approximate answer => less hidden nodes
FYI: in the case of xor, if both inputs are connected straight to the output then a single additional hidden node is required. If no input to output connections are allowed then two hidden nodes will be the minimum.
In answer to the question is there a formula giving the exact number of hidden nodes for problems in general - no.