Do Nested Monte Carlo Algorithm always find the best score where score is the number of moves on the leftmost path

https://cs.stackexchange.com/questions/87859

05-11-2019
|

Pergunta

I am studying Nested Monte Carlo Algorithm addressing the problem of guiding the search toward better states when there is no available heuristic. It uses nested levels of random games in order to guide the search. It consists in two functions : nested(position, level) and sample(position)

The basic sample function just plays a random game from a given position, we use the function play(position, move) which plays the move in the position and returns the resulting position.

int sample (position)
1  while not end of game # I don't understand why did he wrote a while
2    position = play (position,random move)
3  return score

If it just played a random game from a given position, shouldn't it have been without a while and only this : position = play (position,random move) ? And where does he get the score ?

The Nested Monte-Carlo Search function plays a game, choosing at each step of the game the move that has the highest score of the lower level Nested Monte-Carlo Search. At each step the algorithm

tries all possible moves

plays a nested search at the lower level after each move

memorizes the move associated to the best score of the lower level searches.

As the samples are randomized, it is not guaranteed that a nested search will always improve on previous searches or even lower level searches. In order not to lose the best moves of the best sequence found by a previous search, the algorithm memorizes the best sequence. If none of the moves improve on the best sequence, the move of the best sequence is played, otherwise the best sequence is updated with the newly found sequence and the best move is played:

int nested (position, level)
1  best score = -1
2  while not end of game
3    if level is 1
4      move = argmax_m (sample (play (position, m)))
5    else
6      move = argmax_m (nested (play (position, m), level - 1))
7    if score of move > best score
8      best score = score of move
9      best sequence = seq. after move
10   bestMove = move of best sequence
11   position = play (position,bestMove)
12 return score

In the following example he claims that if depth first search has one chance out of two of choosing the wrong move at the root, A level 1 Nested Monte-Carlo Search will always find the best score.

Yet I don't get why for the first step. I have some difficulties when starting playing move = argmax_m (sample ( play (position, m)))

$$move = \underbrace{argmax_m (\overbrace{sample ( \underbrace{play (position, m)}_{(1)})}^{(2)})}_{(3)}$$

so we move first for a given unkown position m. I believe it is chosen by argmax_m. Yet, how can it chose between both nodes ? Where do we test the score ?
Why randomness here ?
Where can we test the argmax_m ?

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange