Why is “next state” kept in RL experience replay?
-
01-11-2019 - |
Pregunta
Following this explanation on what is experience replay (and others), I noticed an experience element is defined as
$e_t = (s_t,a_t,r_t,s_{t+1})$
My question is, why do we need the next state
in the experience?
To my understanding, our networks learn state to action
and action to reward
mappings, so I fail to see where the "next state" is used in experience replay?
No hay solución correcta
Licenciado bajo: CC-BY-SA con atribución
No afiliado a datascience.stackexchange