Logically thinking, the first layer of weights should give you a representation (the hidden layer) that is useful for predicting both outputs. So, this layer should be updated based on the error made in both outputs. But the next layer of weights are separate for each output node, so should get separate weight updates.
So, on second layer weights, the weight updates will be calculated separately based on the respective outputs. For the first layer of weights, I would first calculate error derivatives backpropagating from each output separately and then simply combine them to get the final error derivative. Then apply learning rate to get the weight updates.
Watch out for the dynamic range of your outputs. For example, if one output is producing some real value of range [0,10] and another is producing values in range [-1000,1000] then your updates will be dominated by the one with larger range. You can
- add a preprocessing step that would change your data set to have same dynamic range in both outputs. Also, add a postprocessing step to restore the actual range.
- formulate the error functions for each output so that they produce error values of same dynamic range.