使用Minimax搜索具有不完美信息的纸牌游戏

https://stackoverflow.com//questions/12666119

11-12-2019
|

题

我想使用minimax搜索（用alpha-beta修剪），或者更确切地说，让计算机程序播放纸牌游戏。

卡游戏实际上由4名球员组成。因此，为了能够使用MIMIMAX等，我将游戏简化为“我”对抗“其他”。在每次“移动”之后，您可以客观地阅读当前状态从游戏本身的评估。当所有4名玩家都放置了卡片时，最高赢得了它们 - 以及卡的值计数。

因为您不知道其他3个玩家之间的卡片的分发是什么，我认为您必须使用不是您的卡片模拟所有可能的分布（“世界”）。你有12张牌，另外3名球员总共有36张牌。因此，我的方法是这种算法，其中生成古代替代代码是1到3之间的数字，符号化程序可能需要查找移动的三个计算机播放器。和player代表对手，即所有其他三名球员一起。

private Card computerPickCard(GameState state, ArrayList<Card> cards) {
    int bestScore = Integer.MIN_VALUE;
    Card bestMove = null;
    int nCards = cards.size();
    for (int i = 0; i < nCards; i++) {
        if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card
            int score;
            GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state)
            score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE);
            if (score > bestScore) {
                bestScore = score;
                bestMove = cards.get(i);
            }
        }
    }
    // now bestMove is the card to place
}

private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) {
    ArrayList<Card> cards;
    if (player >= 1 && player <= 3) {
        cards = state.getCards(player);
    }
    else {
        if (player == -1) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(2));
            cards.addAll(state.getCards(3));
        }
        else if (player == -2) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(3));
        }
        else {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(2));
        }
    }
    if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached
        if (player >= 1 && player <= 3) {
            return state.getCurrentPoints(player); // player's points as a positive value (for self)
        }
        else {
            return -state.getCurrentPoints(-player); // player's points as a negative value (for others)
        }
    }
    else {
        int score;
        int nCards = cards.size();
        if (player > 0) { // make one move (it's player's turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // wenn Zug gültig ist
                    score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha);
                    if (score >= beta) {
                        return score;
                    }
                    if (score > alpha) {
                        alpha = score; // alpha acts like max
                    }
                }
            }
            return alpha;
        }
        else { // make three moves (it's the others' turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // if move is valid
                    for (int k = 0; k < nCards; k++) {
                        if (k != i) {
                            GameState futureStateLevel2 = futureState.testMove(cards.get(k));
                            if (futureStateLevel2 != null) { // if move is valid
                                for (int m = 0; m < nCards; m++) {
                                    if (m != i && m != k) {
                                        GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m));
                                        if (futureStateLevel3 != null) { // if move is valid
                                            score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha);
                                            if (score >= beta) {
                                                return score;
                                            }
                                            if (score > alpha) {
                                                alpha = score; // alpha acts like max
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
            return alpha;
        }
    }
}

这似乎工作正常，但对于深度为1（-player），程序已经需要平均计算50,000个移动（放置卡）。这太多了，当然！

所以我的问题是：

是实现正确的吗？你能模拟这样的游戏吗？关于不完美的信息，特别是？
如何在速度和工作负载中提高算法？例如，我可以减少一组可能的移动到随机组50％以提高速度，同时保持良好的结果？
我找到了 uct algorithm 是一个好的解决方案（也许）。你知道这个算法吗？你能帮助我实现它吗？

解决方案

Minimax搜索随着您实现的，这是一个错误的游戏方法，在那里存在如此多的不确定性。由于您不知道其他玩家之间的卡片分布，因此您的搜索将花费指数级时间探索，因为纸牌的实际分配给出无法发生的游戏。

我认为，当您几乎没有关于其他玩家手的信息时，更好的方法将开始以良好的播放规则。像：

如果您在一轮播放，播放您的最低卡，因为您几乎没有赢得圆形的机会。
如果您乘坐最后一次播放，请播放您的最低卡片将赢得圆形。如果你无法赢得圆，那么播放你的最低卡。
让您的程序最初没有打扰搜索，只是通过这些规则并将其假设所有其他玩家都将使用这些启发式使用。作为程序观察第一个和最后一个卡的卡每个圆形播放的玩家都可以建立有关每个玩家可能持有的卡片信息表。例如。 9将赢得这一轮，但球员3没有玩它，所以他不得有任何牌9或更高牌。随着信息的收集到每个玩家的手中，搜索空间最终将被限制为可能的游戏的最小搜索可能会产生有关下一卡播放的有用信息的程度。

其他提示

我想澄清所接受的答案并没有真正进入的详细信息。

在许多纸牌游戏中，您可以对您的对手可能拥有的未知卡来说，而不是生成所有这些。您可以考虑到简短的诉讼等信息，并在迄今为止进行这种采样以重量每只可能手的可能性（每只手是我们独立解决的可能世界的可能性）。然后，您可以使用完美的信息搜索来解决每只手。所有这些世界的最佳迁移往往是总体上最好的举动 - 有些警告。

在像扑克这样的游戏中，这不会很好地工作 - 游戏是关于隐藏信息的。您必须精确平衡您的行为，以保留有关您手中的信息。

但是，在像诡计的纸牌游戏等游戏中，这效果很好 - 特别是因为一直揭示了新信息。真的好玩家有一个好主意，每个人都持有什么。因此，合理强大的Skat和桥梁计划一直基于这些想法。

如果您可以完全解决底层世界，那是最好的，但如果您不能，您可以使用Minimax或UCT选择每个世界的最佳举措。还有一些混合算法（ISMCTS），可以一起将此过程混合在一起。请注意这里的索赔。简单的采样方法更容易编码 - 您应该在更复杂的方法之前尝试更简单的方法。

这里是一些研究论文，将提供更多信息，了解不完美信息的采样方法很好：

了解完美信息Monte Carlo采样在游戏树中的成功搜索（本文分析采样方法可能工作。）

改进状态评估，推理和在基于特技的纸牌游戏中搜索< / a>（本文介绍了SKAT中采样的使用）

在计算具有挑战性的游戏中的不完美信息（本文介绍了桥梁中的采样）

信息集蒙特卡罗树搜索（本文合并采样和UCT / Monte Carlo树搜索以避免第一个参考中的问题。）

所接受答案中基于规则的方法的问题是他们无法利用创建初始规则所需的计算资源。此外，基于规则的方法将受到您可以写的规则的权力的限制。基于搜索的方法可以使用组合搜索的力量，而不是程序的作者产生更强烈的播放。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow