Question

I am looking into the REINFORCE algorithm for reinforcement learning. I am having trouble understanding how rewards should be computed.

The algorithm from Sutton & Barto: enter image description here

What does G, 'return from step t' mean here?

  1. Return from step t to step T-1, i.e. R_t + R_(t+1) + ... + R_(T-1)?
  2. Return from step 0 to step t?, i.e. R_0 + R_1 + ... + R_(t)?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top