I am pretty sure that you have misunderstood the concept here. Two possible strategies are:
- update weights after all errors for one input vector are calculated
- update weights after all errors for all the input vectors are calculated
which is completely different from what you have written. These two method are sample/batch strategies, both having their pros and cons, due to simplicity the first approach is much more common in implementations.
Regarding your "methods", second method is the only correct one, process of "propagating" the error is just a computational simplification of computing derivative of error function, and the (basic) process of learning is a steepest descent method. If you compute the derivative only for part of dimensions (output layer), perform a step in the direction, and then recalculate the error derivatives according to new values, you are not performing a gradient descent. The only scenario, where first method is acceptable is when your weights update do not interfer with your error computation, then it does not matter what order is used, as they are independent.