Вопрос

For one of my assignments in my AI class we were tasked with creating a perceptron learning implementation of the Widrow Hoff delta rule. I've coded this implementation in java:

The following github link contains the project: https://github.com/dmcquillan314/CS440-Homework/tree/master/CS440-HW2-1

The issue that I'm having is not with the creation of the perceptron. That is working fine.

In the project after training the perceptron I then applied an unclassified dataset to the perceptron to then learn the classifications of each input vector. This also worked fine.

My issue pertains to learning which feature of the inputs is the most important.

For example, if the feature set within each input vector was color, car model, and car make and we wanted to classify which feature was the most important. How would one go about doing so.

My original understanding of this led me to believe that calculating the correlation coefficient the value of that feature for each input and the classification vector that is produced. However, this turned out to be a false assumption.

Is there some other way that the most important feature can be learned?

EDIT

Sample weight vector:

( -752, 4771, 17714, 762, 6, 676, 3060, -2004, 5459, 9591.299, 3832, 14963, 20912 )

Sample input vectors:

(55, 1, 2, 130, 262, 0, 0, 155, 0, 0, 1, 0, 3, 0)

(59, 1, 3, 126, 218, 1, 0, 134, 0, 2.2, 2, 1, 6, 1)

(45, 1, 2, 128, 308, 0, 2, 170, 0, 0, 1, 0, 3, 0)

(59, 1, 4, 110, 239, 0, 2, 142, 1, 1.2, 2, 1, 7, 1)

The last element is the classification.

I will post an answer here when I find one. So far I believe that the answer given by the instructor is inaccurate.

Это было полезно?

Решение 2

This turned out to be a lot simpler than I originally thought. The answer/process is as follows:

Given a set of input vectors such as the following:

[1,0,1,0], [0,1,0,1]

The data is already constrained between 0 and 1 to minimize the variance. However, in the case of my data I have something more like the following:

[0,145,0,132],[0,176,0,140]

This causes the variance in some input features to be much larger and you would therefore not be able to use the weight vector as an indicator of feature importance. Therefore, in order for the weight vector to be an indicator of importance we much normalize the data first by dividing by the feature max.

For the above set that would be: [0,176,0,140]

This would result in a set of uniform feature vectors and would also result in the weight vector being an indicator of feature importance.

Другие советы

The importance of a feature is captured by computing how much the learned model depends on a feature f.

A perceptron is a simple feed-forward neural network, and for a neural network (which is a real-valued nonlinear function), dependency corresponds to the partial derivative of output function with respect to f.

The relative importance of a feature is proportional to its average absolute weight on a trained perceptron. This is not always true for neural networks in general. For instance, this need not hold true for multi-layer perceptrons.

For more details (typing the exact formula here will be a notational mess), look at sections 2 and 3 of this paper. I believe equation (8) (in section 3) is what you are looking for.

There, the score is a summation over multiple learners. If yours is a single-layer perceptron, the function learned is a single weight vector

w = (w1, w2, ... wn)

Then, the average absolute weight I mention at the beginning is simply the absolute weight |wi| of the i-th feature. This seems too simple a measure to be ranking the importance of features, right? But ... if you think about it, an n-dimensional input x gets transformed to w . x (the vector dot product). That is, the i-th weight wi fully controls how much the input changes along one dimension of the vector space.

By the way, in most (if not all) classifiers, the feature weight is itself the measure of its importance. It's just that the weights are computed in more complicated ways for most other classifiers.

Because Perceptron Learning, especially Multi-layer Perceptron Network, is a black box algorithm where the weights and the activations are affected partially by some or almost or all features, we have not had tools to extract directly the feature importance yet, while it is easy for tree-based models to do that. However, we can use the PERMUTATION IMPORTANCE method that is introduced here: https://towardsdatascience.com/feature-importance-with-neural-network-346eb6205743

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top