Why is “next state” kept in RL experience replay?

https://datascience.stackexchange.com/questions/44460

machine-learning
reinforcement-learning
q-learning
policy-gradients

01-11-2019
|

Frage

Following this explanation on what is experience replay (and others), I noticed an experience element is defined as

$e_t = (s_t,a_t,r_t,s_{t+1})$

My question is, why do we need the next state in the experience?

To my understanding, our networks learn state to action and action to reward mappings, so I fail to see where the "next state" is used in experience replay?

Keine korrekte Lösung

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit datascience.stackexchange