Reinforcement learning: Discounting rewards in the REINFORCE algorithm
-
31-10-2019 - |
Question
I am looking into the REINFORCE algorithm for reinforcement learning. I am having trouble understanding how rewards should be computed.
The algorithm from Sutton & Barto:
What does G, 'return from step t' mean here?
- Return from step t to step T-1, i.e. R_t + R_(t+1) + ... + R_(T-1)?
- Return from step 0 to step t?, i.e. R_0 + R_1 + ... + R_(t)?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange