Unbalanced discounted reward in reinforcement learning : is it a problem?
-
02-11-2019 - |
Question
Discounted rewards seems unbalanced to me.
If we take as example an episode with 4 actions, where each action receive a reward of +1 :
+1 -> +1 -> +1 -> +1
The discounted reward for the last action is : 1
The discounted reward for the first action (considering gamma = 1
for simplicity) is : 4
Intuitively both action are as good as the other, because both received same reward.
But their total reward is different, unbalanced.
So when we will backpropagate, first action will be favored over last action ?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange