TicTacToe strategic reduction

https://stackoverflow.com/questions/2989259

24-10-2019
|

문제

I decided to write a small program that solves TicTacToe in order to try out the effect of some pruning techniques on a trivial game. The full game tree using minimax to solve it only ends up with 549,946 possible games. With alpha-beta pruning, the number of states required to evaluate was reduced to 18,297. Then I applied a transposition table that brings the number down to 2,592. Now I want to see how low that number can go.

The next enhancement I want to apply is a strategic reduction. The basic idea is to combine states that have equivalent strategic value. For instance, on the first move, if X plays first, there is nothing strategically different (assuming your opponent plays optimally) about choosing one corner instead of another. In the same situation, the same is true of the center of the walls of the board, and the center is also significant. By reducing to significant states only, you end up with only 3 states for evaluation on the first move instead of 9. This technique should be very useful since it prunes states near the top of the game tree. This idea came from the GameShrink method created by a group at CMU, only I am trying to avoid writing the general form, and just doing what is needed to apply the technique to TicTacToe.

In order to achieve this, I modified my hash function (for the transposition table) to enumerate all strategically equivalent positions (using rotation and flipping functions), and to only return the lowest of the values for each board. Unfortunately now my program thinks X can force a win in 5 moves from an empty board when going first. After a long debugging session, it became apparent to me the program was always returning the move for the lowest strategically significant move (I store the last move in the transposition table as part of my state). Is there a better way I can go about adding this feature, or a simple method for determining the correct move applicable to the current situation with what I have already done?

해결책

You're on the right track when you're thinking about reflections and rotations. However, you're applying it to the wrong place. Don't add it to your transposition table or your transposition table code -- put it inside the move generation function, to eliminate logically equivalent states from the get-go.

Keep your transposition table and associated code as small and as efficient as possible.

다른 팁

My gut feeling is that you are using too big of a hammer to attack this problem. Each of the 9 spots can only have one of two labels: X or O or empty. You have then at most 3^9 = 19,683 unique boards. Since there are 3 equivalent reflections for every board, you really only have 3^9 / 4 ~ 5k boards. You can reduce this by throwing out invalid boards (if they have a row of X's AND a row of O's simultaneously).

So with a compact representation, you would need less than 10kb of memory to enumerate everything. I would evaluate and store the entire game graph in memory.

We can label every single board with its true minimax value, by computing the minimax values bottom up instead of top down (as in your tree search method). Here's a general outline: We compute the minimax values for all unique boards and label them all first, before the game starts. To make the minimax move, you simply look at the boards succeeding your current state, and pick the move with the best minimax value.

Here's how to perform the initial labeling. Generate all valid unique boards, throwing out reflections. Now we start labeling the boards with the most moves (9), and iterating down to the boards with least moves (0). Label any endgame boards with wins, losses, and draws. For any non-endgame boards where it's X's turn to move: 1) if there exists a successor board that's a win for X, label this board a win; 2) if in successor boards there are no wins but there exists a draw, then label this board a draw; 3) if in successor boards there are no wins and no draws then label this board a loss. The logic is similar when labeling for O's turn.

As far as implementation goes, because of the small size of the state space I would code the "if there exists" logic just as a simple loop over all 5k states. But if you really wanted to tweak this for asymptotic running time, you would construct a directed graph of which board states lead to which other board states, and perform the minimax labeling by traversing in the reverse direction of the edges.

You need to return the (reverse) transposition along with the lowest value position. That way you can apply the reverse transposition to the prospective moves in order to get the next position.

Why do you need to make the transposition table mutable? The best move does not depend on the history.

There is a lot that can be said about this, but I will just give one tip here which will reduce your tree size: Matt Ginsberg developed a method called Partition Search which does equivalency reductions on the board. It worked well in Bridge, and he uses tic-tac-toe as an example.

You may want to try to solve tic-tac-toe using monte-carlo simulation. If one (or both) of the players is a machine player, it could simply use the following steps (this idea comes from one of the mini-projects in the coursera course Principles of Computing 1 which is a part of the Specialization Fundamentals of Computing, taught by RICE university.):

Each of the machine players should use a Monte Carlo simulation to choose the next move from a given TicTacToe board position. The general idea is to play a collection of games with random moves starting from the position, and then use the results of these games to compute a good move.

When a paritular machine player wins one of these random games, it wants to favor the squares in which it played (in hope of choosing a winning move) and avoid the squares in which the opponent played. Conversely, when it loses one of these random games, it wants to favor the squares in which the opponent played (to block its opponent) and avoid the squares in which it played.

In short, squares in which the winning player played in these random games should be favored over squares in which the losing player played. Both the players in this case will be the machine players.

The following animation shows a game played between 2 machine players (that ended in a tie), using 10 MC trials at each board state to determine the next move.

It shows how each of the machine players learns to play the game just by using Monte-Carlo Simulation with 10 trials (a small number of trials) at every state of the board, the scores shown at the right bottom of each grid square are used by each of the players at their corresponding turns, to choose its next move (the brighter cells represent better moves for the current player, as per the simulation results).

Here is my blog on this for more details.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow