The meaning of γt−t0 in Reinforcement learning with pytorch
-
10-12-2020 - |
Question
When reading pytorch tutorial:
Our aim will be to train a policy that tries to maximize the discounted, cumulative reward
Rt0=∑∞t=t0γt−t0rt
, whereRt0
is also known as the return
I know γ is the discount factor, but I am not sure that what t-t0
ofγt−t0
mean?
Thank you.
Solution
I have no experience with reinformcement learning, however looking at the figure I think I understand what is meant. Gamma is the discount factor, which is taken to the power t-t0
, i.e. the number of episodes starting from t
. This gives the discount factor for a specific episode, which is then multiplied by the return of that episode, r_t
, to get the discounted reward for one specific episode. The total return is then computed by summing all the future rewards for future episodes.