Question

I have a multi-class classification problem on a data set (with 6 target classes).The training data has a skewed distribution of the class labels: Below is a distribution of each of the class labels (1 to 6)

(array([174171,     12,     29,   8285,   9996,  11128]),

I am using vowpal wabbit's oaa scheme to classify and have tried the default weight of 1.0 for each example. However for most models this just results in the model predicting 1.0 for all examples in the evaluation (as label 1 has a very large representation in the training set).

I am trying to now experiment with different weights that I can apply to the examples of each class to help boost the performance of the classifier.

Any pointers or practical tips on techniques to decide on weights of each example would be very useful. One possible technique was to weigh the example in inverse ratio according to their frequency. Unfortunately this seems to result in the classifier being biased greatly towards Labels 2 and 3 , and predicting 2 and 3 for almost everything in the evaluation.

Would the model choice play a role in deciding the weights. I am experimenting with neural networks and logistic and hinge loss functions.

Was it helpful?

Solution

There may be better approaches, but I would start, like you did, by inverse weighting the examples based on the rarity of their labels as follows:

Sum of counts of labels = 174171 + 12 + 29 + 8285 + 9996 + 11128 = 203621 so

Label 1 appearing 174171 times (85.5% of total) would be weighted: 203621/174171 = 1.16909 Label 2 appearing 12 times (rarest) would be weighted: 203621/12 = 16968.4

and so on.

Make sure the examples in the train-set are well shuffled. This is of critical importance in online learning. Having the same label examples lumped together is a recipe for very poor online performance.

If you did shuffle well, and you get bad performance on new examples, you can reweight less aggressively, for example take the sqrt() of the inverse weights, then if that's still too aggressive, switch to log() of the inverse weights, etc.

Another approach is to use one of the new cost-sensitive multi-class options, e.g. --csoaa The VW wiki on github has some examples with details on how to use these options and their training-set formats.

The loss function chosen should definitely have an effect. However note that generally, when using multi-class, or any other reduction-based option in vw, you should leave the --loss_function alone and let the algorithm use its built-in default. If you try a different loss function and get better results than the reduction built-in loss-function, this may be of interest to the developers of vw, please report it as a bug.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top