Reinforcement learning: decreasing loss without increasing reward

https://datascience.stackexchange.com/questions/37792

reinforcement-learning
q-learning

31-10-2019
|

Question

I'm trying to solve OpenAI Gym's LunarLander-v2.

I'm using the Deep Q-Learning algorithm. I have tried various hyperparameters, but I can't get a good score.

Generally the loss decreases over many episodes but the reward doesn't improve much.

How should I interpret this? If a lower loss means more accurate predictions of value, naively I would have expected the agent to take more high-reward actions.

Could this be a sign of the agent not having explored enough, of being stuck in a local minimum?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange