The formulation was proposed by Irwin Sobel a long time ago. I think about 1974. There is a great page on the subject here.
The main advantage of convolving the 9 pixels surrounding one at which gradients are to be detected is that this simple operator is really fast and can be done with shifts and adds in low-cost hardware.
They are not the greatest edge detectors in the world - Google Canny edge detectors for something better, but they are fast and suitable for a lot of simple applications.