Why is “next state” kept in RL experience replay?
-
01-11-2019 - |
Question
Following this explanation on what is experience replay (and others), I noticed an experience element is defined as
$e_t = (s_t,a_t,r_t,s_{t+1})$
My question is, why do we need the next state
in the experience?
To my understanding, our networks learn state to action
and action to reward
mappings, so I fail to see where the "next state" is used in experience replay?
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange