Minimax / Alpha Beta for Android Reversi Game

Question

First you can check this piece of code for a checkers AI that I wrote years ago. The interesting part is the last function (alphabeta). (It's in python but I think you can look at that like pseudocode).

Obviously I cannot teach you all the alpha/beta theory cause it can be a little tricky, but maybe I can give you some practical tips.

Evaluation Function

This is one of the key points for a good min/max alpha/beta algorithm (and for any other informed search algorithm). Write a good heuristic function is the artistic part in AI development. You have to know well the game, talk with expert game player to understand which board features are important to answer the question: How good is this position for player X?

You have already indicated some good features like mobility, stability and free corners. However note that the evaluation function has to be fast cause it will be called a lot of times.

A basic evaluation function is

H = f1 * w1 + f2 * w2 + ... + fn * wn

where f is a feature score (for example the number of free corners) and w is a corresponding weight that say how much the feature f is important in the total score.

There is only one way to find weights value: experience and experiments. ;)

The Basic Algorithm

Now you can start with the algorithm. The first step is understand game tree navigation. In my AI I've just used the principal board like a blackboard where the AI can try the moves.

For example we start with board in a certain configuration B1.

Step 1: get all the available moves. You have to find all the applicable moves to B1 for a given player. In my code this is done by self.board.all_move(player). It returns a list of moves.

Step 2: apply the move and start recursion. Assume that the function has returned three moves (M1, M2, M3).

Take the first moves M1 and apply it to obtain a new board configuration B11.
Apply recursively the algorithm on the new configuration (find all the moves applicable in B11, apply them, recursion on the result, ...)
Undo the move to restore the B1 configuration.
Take the next moves M2 and apply it to obtain a new board configuration B12.
And so on.

NOTE: The step 3 can be done only if all the moves are reversible. Otherwise you have to find another solution like allocate a new board for each moves.

In code:

for mov in moves :
    self.board.apply_action(mov)
    v = max(v, self.alphabeta(alpha, beta, level - 1, self._switch_player(player), weights))
    self.board.undo_last()

Step 3: stop the recursion. This three is very deep so you have to put a search limit to the algorithm. A simple way is to stop the iteration after n levels. For example I start with B1, max_level=2 and current_level=max_level.

From B1 (current_level 2) I apply, for example, the M1 move to obtain B11.
From B11 (current_level 1) I apple, for example, the M2 move to obtain B112.
B122 is a "current_level 0" board configuration so I stop recursion. I return the evaluation function value applied to B122 and I come back to level 1.

In code:

if level == 0 :
    value = self.board.board_score(weights)
    return value

Now... standard algorithm pseudocode returns the value of the best leaf value. Bu I want to know which move bring me to the best leaf! To do this you have to find a way to map leaf value to moves. For example you can save moves sequences: starting from B1, the sequence (M1 M2 M3) bring the player in the board B123 with value -1; the sequence (M1 M2 M2) bring the player in the board B122 with value 2; and so on... Then you can simply select the move that brings the AI to the best position.

I hope this can be helpful.

EDIT: Some notes on alpha-beta. Alpha-Beta algorithm is hard to explain without graphical examples. For this reason I want to link one of the most detailed alpha-beta pruning explanation I've ever found: this one. I think I cannot really do better than that. :)

The key point is: Alpha-beta pruning adds to MIN-MAX two bounds to the nodes. This bounds can be used to decide if a sub-tree should be expanded or not.

This bounds are:

Alpha: the maximum lower bound of possible solutions.
Beta: the minimum upper bound of possible solutions.

If, during the computation, we find a situation in which Beta < Alpha we can stop computation for that sub-tree.

Obviously check the previous link to understand how it works. ;)