Why do we need to use a sigmoid function when using backpropagation?

Question 1

Strictly speaking, you don't need a sigmoid activation function. What you need is a differentiable function that serves as an approximation to the step function. As an alternative to the sigmoid, you could instead use a hyperbolic tangent function.

For multi-layer perceptron networks, the simple perceptron learning rule does not provide a means for determining how a weight several layers from the output should be adjusted, based on a given output error. The backpropagation learning rule relies on the fact that the sigmoid function is differentiable, which makes it possible to characterize the rate of change in the output layer error with respect to a change in a particular weight (even if the weight is multiple layers away from the output). Note that as the k parameter of the sigmoid tends toward infinity, the sigmoid approaches the step function, which is the activation function used in the basic perceptron.

Question 2

Sigmoid activation allows for a smooth curve of real values numbers from [0,1]. This way, the errors can be calculated and tuned in such a way that the next time you perfmorm feed-forward, it will output not just integers, but predictions from [0,1]. This way you can choose which to ignore, and which to accept.

What you described would be a binary neuron, which is completely acceptable as well. But the sigmoid activated neurons give you that spectrum of [0,1]