Question

I'm trying to solve OpenAI Gym's LunarLander-v2.

I'm using the Deep Q-Learning algorithm. I have tried various hyperparameters, but I can't get a good score.

Generally the loss decreases over many episodes but the reward doesn't improve much.enter image description here

How should I interpret this? If a lower loss means more accurate predictions of value, naively I would have expected the agent to take more high-reward actions.

Could this be a sign of the agent not having explored enough, of being stuck in a local minimum?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top