Why is “next state” kept in RL experience replay?
-
01-11-2019 - |
Frage
Following this explanation on what is experience replay (and others), I noticed an experience element is defined as
$e_t = (s_t,a_t,r_t,s_{t+1})$
My question is, why do we need the next state
in the experience?
To my understanding, our networks learn state to action
and action to reward
mappings, so I fail to see where the "next state" is used in experience replay?
Keine korrekte Lösung
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit datascience.stackexchange