Pergunta

I am going through the Monte Carlo methods, and it's going fine until now. However, I am actually studying the On-Policy First Visit Monte Carlo control for epsilon soft policies, which allows us to estimate the optimal policy in Reinforcement Learning.

Algorithm

I am having troubles understanding the step in blue of the algorithm. Is the pair St, At NEVER supposed to appears in the given set of states ? In this case, the following pseudo-code will never get realized ? What is the specific case where St, At appears in the set of Action-States ?

Please feel free to ask more details if my question isn't clear enough.

The algorithm is taken from the reinforcement learning book written by R.Sutton and A. Barto.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top