Pergunta

Both AlphaGo and AlphaGo Zero include prior board states as input features (the "Turns Since" planes for AlphaGo, and the repeated 8-step history planes for AlphaGo Zero).

What is the purpose of including this history information in the input to the neural networks?

If we ignore the ko rule, the best move at a position should not depend on the history of moves leading up to that position. If we don't ignore ko, a single step of board history should be sufficient in the vast majority of games, so including 8 steps of history seems excessive (and possibly even harmful, since if the same position were reached by two different paths, the learned response might not be shared between those two paths).

This doesn't seem to be discussed in either of the papers, or in any of the media reporting that I have seen.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top