what features exactly the deep belief network of Geoff hinton extract from image while training his system for handwritten digit recognition?

Question 1

Deep learning is not about feature engineering. The whole point of Hinton work was not to design any features. System is trained on the raw image (just binarized), that's all. Everything else is done completely automatically in the deep training process (in his case, in multi-layered unsupervised data representation learning using stacks of Restricted Boltzmann Machines). What the system learned by itself was multilevel representation, based on geometrical features of the image on many scales (from corners, through lines to shapes).

Question 2

As was mentioned earlier, there is subfield (or intersection with) of Deep Learning named Representation Learning (or Feature Learning). And these, indeed, try to learn meaningful representation of input data. It's especially useful in case of unsupervised learning when one can get a lot of unlabeled data, but obtaining labeled data is expensive.

One of deep learning models dealing with unsupervised feature learning is AutoEncoder (basically Neural Network with some constraints predicting it's input). In nearly all of papers on AutoEncoders you can find pictures like these (picture from Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion): Learned features

What does this picture mean: suppose you have a NN with image raw pixels (in this case it's corrupted by some noise, remember about constraints!) as it's input layer and predicting the same image (that is, it has as many nodes in the output layers as there are in the input layer, but now they aren't corrupted). Then you have some neuron in the hidden layer which is connected to all input nodes, that is it has one parameter for each pixel. Combining all these parameters together we get another image, which serves as a visualization of learned features. Basically, what that hidden neuron does is filters input image to extract one specific feature. We'd like filters to expose some variation and structure in order to be useful (that's why case a in the picture is bad and b and c are better).