Pandemic board game - find all possible next 4 actions

https://cs.stackexchange.com//questions/117356

29-11-2019
|

Pergunta

I hope I am asking this on the right community.

I am building an Artificial Intelligence to play the board game Pandemic. The AI uses a Monte Carlo algorithm. After implementing the game rules and the AI, it works as expected (very poorly, but it works).

I figured the main problem the current algorithm has is the action dispersion. In Pandemic, each player is allowed up to 4 actions. The number of possible actions is usually between 10 and 100. The current algorithm run 1000 Monte Carlo simulations to chose to the best next action, perform the action and start again. This results in a lot of useless simulation as, if the AI play the action A, it will consider playing the action !A.

I think it would be much more efficient to compute all the valuable series a 4 actions, as there are usually less than 50 efficient/valuable series of 4 actions to play. But my current implementation is way to slow and memory inefficient.

Here is a recap of the rules (simplified):

there are 48 cities on the board, each city having between 3 to 5 neighbours. The graph is always the same.
part of the disease spreading mechanism is between 0 and 3 cubes on each city
a player has between 0 and 7 city card in his hand
a player moves between cities

The possible actions are:

MOVE: the player moves from the city he is in to a neighbour city
HEAL: the player removes one cube from the city he is in
CHARTER FLIGHT: if the player has the card of the city he is in, he can discard it to move to any other city
DIRECT FLIGHT: the player can discard any card from its hand to move to the card's city
SHARE KNOWLEDGE: the player can give or take the card of the city he is in to/from another player in the same city
BUILD: the player can discard the card of the city he is in to build a Research Center
MOVE BETWEEN RESEARCH CENTER: if there is a research center in the city the player is in, he can move to any other research center

Implementation The implementation need to run very fast and require low memory, in order for the Monte Carlo simulation to run smoothly. It should only keep the series of 4 actions that have different and valuable results:

if two series of 4 actions have the same effect except that one requires to discard a card, then the second series must be ignored
if a series of 4 actions has an effect strictly inferior to the effect of another series, then the first series must be ignored.

I am looking for help regarding the strategy to use to determine all the valuable series of 4 actions, and any algorithm/implementation idea that could improve on my current work.

What I tried so far

The direct approach Given an action perform all next possible actions on a duplicated game state, store the resulting game states and only keep the series of 4 actions that have different output. Unfortunately, this requires to clone the game state too many times. Plus, it is not optimised: if the first two actions have the same output, there is no need to explore both branches.

The direct approach optimised For each action, evaluate the state of the board. Only keep the action that have different outputs. Do it again for each output. This is slightly better, but the game state duplication is still a problem.

The loop between all possible actions approach Pre-build the list of all possible series of 4 actions, independently of the board state. Loop through the list and remove series of 4 actions that are not playable. It was too slow, as without game state constraint there are around 1 billion possible series of 4 actions.

The cancel approach Similar to the direct approach, but instead of duplicating the game state before performing an action, we perform the action explore the branch, and then cancel it to try another branch. It was a bit better than the other approach, but it's impossible to cut useless branches before reaching the end.

The delta approach Each action is associated with a Delta object. When performing two actions the Delta of the 2 actions are added. Only series of 4 actions with different Delta are kept. This solves the game state duplication issue, but is slower to run as in order to make sure that before exploring a branch we need to make sure that after applying the delta of the previous actions to the game state, the branch's action is playable.

The goal approach Not yet implemented. It would determine each positive action available on the board (HEAL, BUILD, SHARE KNOWLEDGE) and try to build a series of 4 actions doing those.

Solução

Without more details, I think it is important to precise that Pandemic is 2-4 players cooperative game. Players are allowed to discuss about strategy and share any information (even if the rules are ambigous on the possibility to share card in hand information). Moreover to achieve victory efficiently, players have to trade cards which is hard without coordination.

Type of AI

So first of all, you have to be clear on what you expect for your AI:

Global AI: The AI play for all players and then have no need to predict another player turn. This is the simplest to implement.
Independant equivalent AI : this is nearly the same but with a different paradigm. Every player is managed by a different bot. But they can predict perfectly any other turn.
Independant AI : this is probably your objective as it let a human player participate. But there is a need to predict what it would do or to simulate communication.

Evaluation function

Now let's talk about the algorithm begining by the evaluation function. As you said, you have to compute the game state resulting of any action you may do. Then to evaluate it to give a score to every action sequence and chose the best.

Computing the full new state may be indeed very time/space consuming. You may probably work with "delta" to the initial state. you have to be able to run you evaluation function on this "delta" state to get a "delta" score.

A tip to treat the 4-action sequence is to first determine all accessible position and their cost in action/card. Then decide what is the best to do with the remaining actions.

Having a good evaluation function is the most important of any game AI. It needs experience and abstraction of the game to model which state changes are likely to lead to victory or defeat. Unfortunatly, Pandemic is a very hard and complex game and you might need experience on simpler games to design a proper evaluation function.

Prediction algorithm

As you said, if you limit the computation to your 4 next actions, there is no sense to use MCTS. But to have better prediction, you can compute the state resulting of the next players and the random issues.

Generally the depth of the analysis depends on the time you accept to spend to it. Then, optimizing your state computation (with "delta" for instance) let you go far deeper in analysis.

An important feature is to "prune" any unefficient branch as soon as possible. This is typically what you mean with "if two series of 4 actions have the same effect except that one requires to discard a card, then the second series must be ignored". In fact you have to express this feature in term of state score.

Knowing well Pandemic game, I would expect that a decent AI has to predict at least the next turn of every other player. As you can expect, it is no worth to spend a full turn of 4 actions to do the thing that the next player will do in 1 action.

My advice

Start simple, build a 2 players global AI and assume random card draws are totally predictable. Improve your capacity to do "delta" on state, your MCTS and overall your evaluation function.

Sequence management

The idea is to first compute all positions in range (a BFS is perfect for this). The "range cost" is expressed in "card(C)/action(A)". A "1C2A" range is actually worse than a "0C2A" range but you may consider 2 ranges if you have for instance "1C1A" and "0C2A".

Thus you have to build a 4-actions sequence, so you have to consider all positions you can go to in 4 actions. On these positions, you will either: - end you turn (if you used the 4 actions for instance) - do an useful action (cure, trade, build)

In the first case, you won't move again as it would be equivalent to another direct move of your initial position list. But if you do an action, you may have a subsequent move(/action) needing a 2nd depth level in your exploration tree but with a lower range.

Let's take a simple example. You are at position "O" in this graph:

"B1" <= "A1" <= "O" => "A2" => "B2" => "C2" => "D2"

There are possible actions only on O (cure), A1 (2 cures), B2 (trade) and C2 (build).

So your first level tree is [(pos, remaining actions)]:

(O, 4), (A1, 3), (A2, 3), (B1, 2), (B2, 2), (C2, 1), (D2, 0)

if you stay at O, you will at least do an action which is cure before moving again. And you may build the subtree

(O, 3), (A1, 2), (A2, 2), (B1, 1), (B2, 1), (C2, 0)

if you go to A2, you won't move again because there is no action to do here and any accessible position is already considered in tree. So you may directly compute the delta score of this state with the delta position of your marker.

If you go to A1, you will do either 1 cure, with a remaining subtree (A1, 2), (B1, 1), (O, 1), (A2, 0), either 2 cures with a remaining subtree (A1, 1), (B1, 0), (O, 0)

and so on... Also note that there is no point to use charter (use the card of the origin point to anyway) to go on a position where you can perform no action. It will limit the tree size.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange