MAXent classifier NLTK output understand

https://stackoverflow.com/questions/16266842

13-04-2022
|

Question

I am trying to understand the classifier.show_most_informative_features(10) for MAXent classfier. I don't understand what the columns indicate, for example in the following output:

train on 460 instances, test on 154 instances accuracy: 0.61038961039 
pos precision: 0.432989690722 
pos recall: 0.893617021277 
neg precision: 0.912280701754 
neg recall: 0.485981308411    
-4.141 need==True and label is 'REL'    
3.395 approves==True and label is 'IRREL'   -
3.308 took==True and label is 'IRREL' 
-1.766 treat==True and label is 'REL' 
-1.488 tired==True and label is 'IRREL' 
-1.295 gave==True and label is 'IRREL' 
0.879 need==True and label is 'IRREL'

Solution

It seems that you have two labels, "RELEVANT" and "IRRELEVANT". When there are two labels, one is normally named "1" or positive and the other "-1" or negative.

During the training process, the classifier analysed the features of the 460 training instances and weighted them according to their ability to distinguish well between the two labels. The details of the weighting process depend on the algorithm you chose.

Poitive precision: 43 % of the 154 testing instances that were classified as label 1 during the testing really have the label 1.

Positive recall: 89 % of the label 1 instances in the testing set were found, i.e. classified as label 1.

Negative precision / Negative recall is the same, but for label -1.

Accuracy: 61 % of the 154 testing instances were labeled correctly.

The features are sorted according to their absolute value which corresponds to their relevance for the classification. The most "helpful" feature in this case was need, and if it is true, this is a very good hint that the label of the instance should be "RELEVANT".

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow