Analysis of komi values for increasing Go board sizes and agents strength

https://cs.stackexchange.com/questions/81384

04-11-2019
|

题

Komi is the additional number of points given to the non-starting player in the game of Go.

For 19x19 board, currently it is 6.5 under Japanese rules, 7.5 points under Chinese rules. In the past it was lower, even 2.5 points.

A question arises: what value it should be so that it is a fair game? Draws allowed.

(This handles the unlikely to me case, in which Go is already fair, without komi. Then one would just discover that komi should be 0. So, I'm not assuming that Go without komi is unfair)

The following idea came to my mind:

Let's take e.g. 5x5 board and an agent A. Let it self-play many games, plot the average of points won by the starting player in the end of the game, stop simulations when the value converges.

Do the same for 6x6, 7x7 board and so on up to 19x19 (or further).

So, for a given agent A, we will have a list of mean scores for a starting player on increasing board sizes, e.g. [25, 14.2, 9.6, ...], meaning that on a 5x5 board, on average, the first player wins with 25 points. On 6x6 board, on average, with 14.2 points, and so on. These could be the komi values.

Now, maybe the values in the list will be always decreasing. Maybe they converge to some value as the board size grows. Is it already described somewhere?

The values will probably depend on the agent A. Maybe one could also find a relationship between the strength of the agent A and the values in the list. E.g. maybe the stronger the agent, the values decrease slower.

So, one could start with a weak agent. See what values in the list it produces. Then, improve agent A (using e.g. some reinforcement learning algorithm or giving more time to the search) and produce another list of mean values. And so on.

Maybe the variance of results (or some other function) could be a nice measure of how far the current agent is from the optimal one (because the optimal one vs optimal one will always result in exactly one outcome, variance 0). So e.g. maybe agent who has scores vs himself in [-2; 2] is closer to the optimal one than the agent with scores in [-20; 20].

Say for the weakest agent A the list would be [25, 14.2, 9.6, ..., 6.6] and for the strongest agent A (ideally this agent would play optimally) [25, 18, 14, ... 8.2]. Now one knows that the sensible komi for 19x19 is in the range of 6.6 and 8.2.

My questions are:

1) Have you seen something similar done before? I would be happy to read. I've read Solving Go for Rectangular Boards, which solves Go up to 5x5. If such analysis wasn't described, I could do one out of curiosity.

2) Do you see some problems, refinements or other interesting things to check?

没有正确的解决方案

许可以下： CC-BY-SA 和归因

不隶属于 cs.stackexchange