Question

I'm new to reinforcement learning. I'm studying DQN with decaying epsilon. I came across such example:

EPISODES = 91

GAMMA = 0.2

EPSILON_DECAY = 0.999

MIN_EPSILON = 0.01

MAX_EPSILON = 1

My questions are:

  1. Is it correct if epsilon doesn't reach MIN_EPSILON?
  2. Is there something wrong with the reward - the reward is not higher and higher but it is behaving otherwise - it decreases in time?

EPSILON

Average reward

Was it helpful?

Solution

  1. If you set epsilon decay to 0.999 you will need $$ \epsilon_{max} \cdot \epsilon_{decay}^x = \epsilon_{min} \\ 1 \cdot 0.999^x = 0.01 \\ x \approx 4603 $$ 4603 episodes to reach minimum epsilon. After 91 episodes you will reach $$ \epsilon_{current} = \epsilon_{max} \cdot \epsilon_{decay}^{episodes} = 1 \cdot 0.999^{91} \approx 0.913 $$ which is exactly what you can see in your plot. It's not a problem but remember that this model still makes over 91% moves randomly.
  2. Average reward should not decrease over time. It can mean a few things for example error in dqn algorithm or too high learning rate in your model. The best way to debug is to start with as simple environment as possible and let your model learn to play it and only then increase the difficulty.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top