In the linked paper, a single set of features is developed for all classes. Bhattacharyya distance is a bounded distance measure of how separable two Gaussian distributions are. The article doesn't appear to describe specifically how the Bhattacharyya distance is used (the average of a matrix of inter-class distances?). But once you have your Bhattacharyya-based metric, there are a few ways you can select your features. You can start with an empty set of features and progressively add features to the set (based on how separable the classes are with the new feature). Or you can start with all the features and progressively discard features that provide the least separability. The Plus l-Take Away r algorithm combines those two approaches.
Once the subset of original features has been selected, the feature reduction step reduces dimensionality through some transformation of the original features. As you quoted, the authors used both PCA and LDA. The important distinction between the two is that PCA is independent of the training class labels and to reduce dimensionality, you must choose how much of the variance to retain. Whereas LDA tries to maximize separability of the classes (by maximizing the ratio of between-class to within-class covariances) and provides a number of features equal to one less than the number of classes.
But the important point here is that after feature selection and reduction, the same set of features is used for all classes.