Backprop is an algorithm for computing derivatives/gradients of a loss function wrt. a neural network's weights. When combined with an optimization algorithm (commonly gradient descent or conjugate gradient, although other algorithms are used as well), it can be used to find the NN weights that minimize the loss function on a training set.
I.e., the suggestion is that you train a neural net by minimizing regularized cross-entropy loss. That's what's usually meant by "training" a neural net for classification and what many NN libraries/toolkits actually do.
(Of course, if P is some non-standard penalty term, you may have to implement backprop yourself or find a toolkit that is extensible; non-differentiable penalty terms may also require changes to the optimization algorithm.)