Pergunta

I was trying to emulate a research which included machine learning. In that the researcher used both feature selection and feature reduction before using Gaussian Classifiers from classification.

My question is as follows: Say I have 3 classes. I select the (say,) the top 3 best features for each class from a total of (say) 10 features. The features selected are for example as follows:

Class 1: F1 F2 F9
Class 2: F3 F4 F9
Class 3: F1 F5 F10

Since principal component analysis or Linear Discriminant analysis both work on the complete data-set or atleast datasets in which all classes have the same features how do I perform feature reduction on such a set and then perform training?

Here is the link for the paper: Speaker Dependent Audio Visual Emotion Recognition

Following is an exerpt from the paper:

The top 40 visual features were selected with Plus l-Take Away r algorithm using Bhattacharyya distance as a criterion function. The PCA and LDA were then applied to the selected feature set and finally single component Gaussian classifier was used for classification.

Foi útil?

Solução

In the linked paper, a single set of features is developed for all classes. Bhattacharyya distance is a bounded distance measure of how separable two Gaussian distributions are. The article doesn't appear to describe specifically how the Bhattacharyya distance is used (the average of a matrix of inter-class distances?). But once you have your Bhattacharyya-based metric, there are a few ways you can select your features. You can start with an empty set of features and progressively add features to the set (based on how separable the classes are with the new feature). Or you can start with all the features and progressively discard features that provide the least separability. The Plus l-Take Away r algorithm combines those two approaches.

Once the subset of original features has been selected, the feature reduction step reduces dimensionality through some transformation of the original features. As you quoted, the authors used both PCA and LDA. The important distinction between the two is that PCA is independent of the training class labels and to reduce dimensionality, you must choose how much of the variance to retain. Whereas LDA tries to maximize separability of the classes (by maximizing the ratio of between-class to within-class covariances) and provides a number of features equal to one less than the number of classes.

But the important point here is that after feature selection and reduction, the same set of features is used for all classes.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top