Minimax algorithm

https://stackoverflow.com/questions/1588738

22-09-2019
|

Question

I have a simple question regarding the Minimax algorithm: for example for the tic-tac-toe game, how do I determine the utility function's for each player plays? It doesn't do that automatically, does it? I must hard-code the values in the game, it can't learn them by itself, does it?

Solution

No, a MiniMax does not learn. It is a smarter version of a brute-force tree search.

OTHER TIPS

Typically you would implement the utility function directly. In this case the algorithm would not learn how to play the game, it would use the information that you had explicitly hard-coded in the implementation.

However, it would be possible to use genetic programming (GP) or some equivalent technique to automatically derive a utility function. In this case you would not have to encode any explicit strategy. Instead the evolution would discover its own way of playing the game well.

You could either combine your minimax code and the GP code into a single (probably very slow) adaptive program, or you could run the GP first, find a good utility function and then add this function to your minimax code just as you would any hand-coded function.

Tic-Tac-Toe is small enough to run the game to the end and assign 1 for win, 0 for draw and -1 for lose.

Otherwise you have to provide a function which determines the value of a position heuristically. In chess for example a big factor is the value of the material, but also who controls the center or how easily the pieces can move.

As for learning, you can add weight factors to different aspects of the position and try to optimize those by repeatedly playing games.

How do determine the utility function for each play?

Carefully ;-) This article shows how a slightly flawed evaluation function (one for ex. which either doesn't go "deep" enough in looking ahead in the tree of possible plys, or one which fails to capture the relative strengh of some board positions) results in an overall weak algorithm (one that looses more often).

it can't learn them by itself, does it?

No, it doesn't. There are ways, however, to make the computer learn the relative strength of board positions. For example by looking into Donald Mitchie and his MENACE program you'll see how a stochastic process can be used to learn the board without any a priori knowledge but the rules of the game. The funny part is that while this can be implemented in computers, a few hundred colored beads and match boxes are all that is required, thanks to the relatively small size of the game space, and also thanks to various symmetries.

After learning such a cool way of teaching the computer how to play, we may not be so interested in going back to MinMax as applied to Tic-Tac-Toe. After all MinMax is a relatively simple way of pruning a decision tree, which is hardly needed with tic-tac-toe's small game space. But, if we must ;-) [go back to MinMax]...

We can look into the "matchbox" associated with the next play (i.e. not going deep at all), and use the percentage of beads associated with each square, as an additional factor. We can then evaluate a traditional tree, but only going, say 2 or 3 moves deep (a shallow look-ahead depth which would typically end in usually in losses or draws) and rate each next move on the basis of the simple -1 (loss), 0 (draw/unknown), +1 (win) rating. By then combining the beads percentage and the simple rating (by say addition, certainly not by multiplication), we are able to effectively use MinMax in a fashion that is more akin to the way it is used in cases when it is not possible to evaluate the game tree to its end.

Bottom line: In the case of Tic-Tac-Toe, MinMax only becomes more interesting (for example in helping us explore the effectiveness of a particular utility function) when we remove the deterministic nature of the game, associated with the easy evaluation the full tree. Another way of making the game [mathematically] interesting is to play with a opponent which makes mistakes...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow