Pergunta

Following this explanation on what is experience replay (and others), I noticed an experience element is defined as

$e_t = (s_t,a_t,r_t,s_{t+1})$

My question is, why do we need the next state in the experience?

To my understanding, our networks learn state to action and action to reward mappings, so I fail to see where the "next state" is used in experience replay?

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
scroll top