Reinforcement learning: decreasing loss without increasing reward
-
31-10-2019 - |
Вопрос
I'm trying to solve OpenAI Gym's LunarLander-v2.
I'm using the Deep Q-Learning algorithm. I have tried various hyperparameters, but I can't get a good score.
Generally the loss decreases over many episodes but the reward doesn't improve much.
How should I interpret this? If a lower loss means more accurate predictions of value, naively I would have expected the agent to take more high-reward actions.
Could this be a sign of the agent not having explored enough, of being stuck in a local minimum?
Нет правильного решения
Не связан с datascience.stackexchange