Game Playing AI - Strategy to overcome the transposition table taking up too much memory?

https://softwareengineering.stackexchange.com/questions/384800

18-02-2021
|

Pergunta

I've made an engine that plays Connect Four using the standard search algorithms (minimax, alpha-beta pruning, iterative deepening, etc). I've implemented a transposition table so that the engine can immediately evaluate a same position reached via a different move order, and also to allow for move-ordering.

The problem with the TT is that on each step of the iterative deepening process, the amount of bytes it takes up grows by at least double. The TT stores objects that represent important info for a given position. Each of these objects is 288 bytes. Below, depth limit is how far the engine searches on each step of iterative deepening:

depth limit = 1 - TT size = 288 bytes (since just one node/position looked at).

depth limit = 2 - TT size = 972 bytes.

depth limit = 3 - TT size = 3708 bytes.

depth limit = 4 - TT size = 11664 bytes

depth limit = 5 - TT size = 28476 bytes.

....

depth limit = 12 - TT size = 11,010,960 bytes.

depth limit = 13 - TT size = 22,645,728 bytes.

And now at this point the .exe file crashes.

I'm wondering what can be done about this problem. In Connect Four the branching factor is 7 (since there are usually 7 possible moves that can be made, at least in the beginning of the game). The reason the TT isn't growing by a factor of 7 on each step is due to pruning methods I've implemented.

This problem isn't a big deal if the engine only searches up to depth 8-9, but my goal is to get it to refute Connect Four by going all the way to the end (so depth limit = 42).

Solução

What you fundamentally have is a cache for computed values of an expensive function. As is always the case with such schemes, you will need to decide on an eviction policy for entries in the hash table. Which one you select will depend on the actual structure of the search (you obviously want to maximize cache hits) but least recently used (or LRU) is a good place to start. FIFO and randomized strategies are also common.

The whole idea of LRU is in the name: when you run out of memory in the cache (transposition table), overwrite the oldest entry with the new one. In the context of an alpha-beta search, storing only the most recently computed upper bounds in the table will keep the search sound while still effectively pruning the search and keeping a good balance between storage and recomputation.

Also, a better representation for the board exists. You can do it in 49 bits (7 bytes packed). Here's how. The columns are six cells tall. Represent each column with seven bits, so that the first bit is the space above the column. Mark the first empty space with a 1, those before it 0, and the remaining bits are 0 or 1 for black or red (respectively). Thus each of the seven columns can be unambiguously represented in seven bits, and the whole board (of seven columns) in 49 bits.

Unlike the 10 byte solution given below, these bits can fit in a 64-bit CPU's registers. Equality and hash operations thus become trivial and extremely cheap to compute.

Outras dicas

State

Look at the board, 288bytes is monstrously large.

A Typical board is 7x6 tiles, each tile being in one of 3 states: empty, red, yellow.

Taking a simple bit-packing approach each tile can be represent by 2 bits, there are 42 tiles, allowing the game state to be expressed in 84bits or 10bytes.

Illegal games can be represented by setting each tile to the fourth state (2bits gives the states 00, 01, 10, 11). I'd pick:

enum {
  empty = 0x00,
  yellow = 0x01,
  red = 0x02,
  error = 0x03
}

A game can be quickly checked for illegal state by a bit comparison:

byte[8] game_state;
for (int i = 0; i != 10; ++i)
    if (0 != ((game_state[i] & 0x55) & ((game_state[i] & 0xAA) >> 1))
        return error; //one of the tiles in the game is illegal...
return good;

Similarly the players turn can be deduced by counting the number of yellow player tiles modulo 2 (xor each bit) if the answer is 0 then it is yellows turn, otherwise it is reds.

byte[10] game_state;
byte bcount = (game_state[0] & 0x55)
            ^ (game_state[1] & 0x55)
            ^ (game_state[2] & 0x55)
            ^ (game_state[3] & 0x55)
            ^ (game_state[4] & 0x55)
            ^ (game_state[5] & 0x55)
            ^ (game_state[6] & 0x55)
            ^ (game_state[7] & 0x55)
            ^ (game_state[8] & 0x55)
            ^ (game_state[9] & 0x55)
            ;
bcount = ((bcount & 0x50) >> 4) ^ (bcount & 0x05));
return ((bcount & 0x04) >> 2) == (bcount & 0x01))
    ? yellow_turn
    : red_turn
    ;

As each state has 7 moves mixed between legal (no tile is error) and illegal (tiles are error) then a node in the decision tree can be expressed by eight game states. The first state is the starting condition, the other 7 representing the move for placing a token in the ith column, each 10byte state is the key for the next decision node (unless its an error). This gives 80bytes total per decision node.

Space

The game state space is huge.

7 columns
each column can have stacks between 0 to 6 tiles high.
each occupied state tile has two states.

Ignoring the game terminating due to a victory, the game state-space is: (Ex=0..6(2^x))^7. That is 532875860165503 which with a decision node size 80bytes gives a storage requirement of at most: 38.771 PetaBytes. Even with extensive filtering that is going to be hard to deal with.

With your greater knowledge of the game, you can probably reign in that upper-bound a lot. The point is that there is no computer system available with that much main memory or even memory period - that most of us mere mortals can currently access.

Paging

If you have the Hard-Drive space, then B-Tree or B*-Tree to the rescue. Preferably find a third-party library which offers file-paged nodes, make one only if you must.

The key is the 10byte game-state that the decision node is based on. Binary sorting should be sufficient.

If the B-Tree is styled as a pure map, it may be unnecessary to store the starting game state (the key) with the outcomes for that state.

With this data-structure you should be able to cache a lot of game-states. Unfortunately this is probably insufficient for your needs.

State Patterns and compression

The game has a monotonic board, but when it comes to making a move the only interesting state are the tiles which are candidates for making a victory-line. You could rename the error code to ignored and mark the uninteresting tiles. A game with every tile marked as ignored is by definition unwinnable. This would work to compress the game-state space further, and has the side-benefit of increasing the overlap of game-state reducing further computation.

Rules of thumb

Consider heuristics to try and greedily select the most likely to win paths. Moves which directly contribute to victory, or counter-defeat should be considered first. Moves that hinge on the other player acting irrationally should be delayed. This should promote a mixed breadth/depth exploration.

Also consider a bounded, forgetful, priority queue. Only the most interesting next game states stay on the queue. Should the queue run dry, or the quality of the states drops below the best forgotten state consider reassembling it from a B-Tree walk. For simplicity forget unprocessed game-states to a log which can easily be reprocessed to update the queue.

Finally, give up trying to store the full decision tree. Use a Look-aside cache and regenerate forgotten state.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange