Question

I'm currently developing a solver for a trick-based card game called Skat in a perfect information situation. Although most of the people may not know the game, please bear with me; my problem is of a general nature.

Short introduction to Skat:
Basically, each player plays one card alternatingly, and every three cards form a trick. Every card has a specific value. The score that a player has achieved is the result of adding up the value of every card contained in the tricks that the respective player has won. I left out certain things that are unimportant for my problem, e.g. who plays against whom or when do I win a trick.
What we should keep in mind is that there is a running score, and who played what before when investigating a certain position (-> its history) is relevant to that score.

I have written an alpha beta algorithm in Java which seems to work fine, but it's way too slow. The first enhancement that seems the most promising is the use of a transposition table. I read that when searching the tree of a Skat game, you will encounter a lot of positions that have already been investigated.
And that's where my problem comes into play: If I find a position that has already been investigated before, the moves leading to this position have been different. Therewith, in general, the score (and alpha or beta) will be different, too.
This leads to my question: How can I determine the value of a position, if I know the value of the same position, but with a different history?
In other words: How can I decouple a subtree from its path to the root, so that it can be applied to a new path?
My first impulse was it's just not possible, because alpha or beta could have been influenced by other paths, which might not be applicable to the current position, but...

There already seems to be a solution
...that I don't seem to understand. In Sebastion Kupferschmid's master thesis about a Skat solver, I found this piece of code (maybe C-ish / pseudo code?):

def ab_tt(p, alpha, beta):
    if p isa Leaf:
        return 0

    if hash.lookup(p, val, flag):
        if flag == VALID:
            return val
        elif flag == LBOUND:
            alpha = max(alpha, val)
        elif flag == UBOUND:
            beta = min(beta, val)
        if alpha >= beta:
            return val

    if p isa MAX_Node:
        res = alpha
    else:
        res = beta

    for q in succ(p):
        if p isa MAX_Node:
            succVal = t(q) + ab_tt(q, res - t(q), beta - t(q))
            res = max(res, succVal)
            if res >= beta:
                hash.add(p, res, LBOUND)
                return res
        elif p isa MIN_Node:
            succVal = t(q) + ab_tt(q, alpha - t(q), res - t(q))
            res = min(res, succVal)
            if res <= alpha:
                hash.add(p, res, UBOUND)
                return res
    hash.add(p, res, VALID)
    return res

It should be pretty self-explanatory. succ(p) is a function that returns every possible move at the current position. t(q) is what I believe to be the running score of the respective position (the points achieved so far by the declarer). Since I don't like copying stuff without understanding it, this should just be an aid for anyone who would like to help me out. Of course, I have given this code some thought, but I can't wrap my head around one thing: By subtracting the current score from alpha/beta before calling the function again [e.g. ab_tt(q, res - t(q), beta - t(q))], there seems to be some kind of decoupling going on. But what exactly is the benefit if we store the position's value in the transposition table without doing the same subtraction right here, too? If we found a previously investigated position, how come we can just return its value (in case it's VALID) or use the bound value for alpha or beta? The way I see it, both storing and retrieving values from the transposition table won't account for the specific histories of these positions. Or will it?

Literature:
There's almost no English sources out there that deal with AI in skat games, but I found this one: A Skat Player Based on Monte Carlo Simulation by Kupferschmid, Helmert. Unfortunately, the whole paper and especially the elaboration on transposition tables is rather compact.

Edit:

So that everyone can imagine better how the score develops thoughout a Skat game until all cards have been played, here's an example. The course of the game is displayed in the lower table, one trick per line. The actual score after each trick is on its left side, where +X is the declarer's score (-Y is the defending team's score, which is irrelevant for alpha beta). As I said, the winner of a trick (declarer or defending team) adds the value of each card in this trick to their score.

The card values are:

Rank    J   A   10  K   Q   9   8   7
Value   2   11  10  4   3   0   0   0
Was it helpful?

Solution

I solved the problem. Intead of doing weird subtractions upon each recursive call, as suggested by the reference in my question, I subtract the running score from the resulting alpha beta value, only when storing a position in the transposition table:

For exact values (the position hasn't been pruned):

transpo.put(hash, new int[] { TT_VALID, bestVal - node.getScore()});

If the node caused a beta-cutoff:

transpo.put(hash, new int[] { TT_LBOUND, bestVal - node.getScore()});

If the node caused an alpha-cutoff:

transpo.put(hash, new int[] { TT_UBOUND, bestVal - node.getScore()});

Where:

  • transpo is a HashMap<Long, int[]>
  • hash is the long value representing that position
  • bestVal is either the exact value or the value that caused a cutoff
  • TT_VALID, TT_LBOUND and TT_UBOUND are simple constants, describing the type of transposition table entry

However, this didn't work per se. After posting the same question on gamedev.net, a user named Álvaro gave me the deciding hint:
When storing exact scores (TT_VALID), I should only store positions, that improved alpha.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top