Pergunta

Suppose we have a game and its action space contains two possible actions: A and B.

We have a labelled dataset of state-action pairs but 95% of actions are A and only 5% are B

If we train a neural network, it will always output A as it will choose the most probable class to decrease its loss.

Are there ways to solve this problem?

Foi útil?

Solução

This is for classification, and I am not sure if it is possible to extend them to reinforcement learning.

As you figured out, accuracy should not be used as a metric for a dataset as imbalanced as the one you have. Instead, you should look at a metric such as Area Under Curve(AUC). If you would have infinite data, then you could just rebalance and remove some of the data from the class that has the most samples. However, in many cases data is sparse and you want to use as much of it as possible. Removing data can have a disastrous effect on many applications.

So what are good and convenient ways of handling this?

  • Add weights to the loss function. One weight for class A and one for B. By increasing the magnitude of the loss for the B class the model should not get stuck in a suboptimal solution that just predicts one class.

  • Use another objective(loss) function. F1-score can, for example, be implemented and used as an objective(loss) function.

What is great with these approaches is that it will allow you to use all the data.

Outras dicas

You have tagged the question with reinforcement-learning, but you describe a labeled dataset, suggesting supervised learning. I will try to cover both cases.

There are some techniques that are applicable in both supervised learning and reinforcement learning:

  • Sample from the buffer conditioning on the action, to have a balanced dataset regarding action taken.
  • Apply data augmentation techniques on the minority action class. The Synthetic Minority Oversampling TEchnique (SMOTE) algorithm may be an option for that. The problem with data augmentation is that you would need to do it in the RL loop, which can enlarge the needed computation time.

Note that to apply them for reinforcement learning, you should use a replay buffer, like they do in the DeepMind Atari paper.

If you are in a supervised learning scenario, you can apply class weights, e.g. this example in Keras.

For Imbalanced classes, the method which I prefer the most is bootstrapping.

  1. Lets say you have n classes with number of examples as m , 2m, 3m (this is just to tell which is the minimum).

  2. create multiple dataset with m samples from each classes. (randomly)

  3. keep training on each one of them .

As people have mentioned above you want to try and up-sample / bootstrap. In other words you want to try and get the classes to have similar proportions. One way to do this is to simply randomly select the less likely sample.

More complicated solutions: 1. involve adding realistic noise to the less likely class to increase the number of data points. 2. Using a different score/error function - look at balanced accuracy 3. Initiate the training with 50% A and 50% B - once it converges start training it gradually on a larger part of the data set which will gradually become 95% A and 5% B.

Licenciado em: CC-BY-SA com atribuição
scroll top