Reinforcement learning: Discounting rewards in the REINFORCE algorithm

https://datascience.stackexchange.com/questions/38200

31-10-2019
|

Question

I am looking into the REINFORCE algorithm for reinforcement learning. I am having trouble understanding how rewards should be computed.

The algorithm from Sutton & Barto:

What does G, 'return from step t' mean here?

Return from step t to step T-1, i.e. R_t + R_(t+1) + ... + R_(T-1)?
Return from step 0 to step t?, i.e. R_0 + R_1 + ... + R_(t)?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange