Minimax 알고리즘은 최상의 움직임을 반환하지 않습니다

https://stackoverflow.com/questions/9511814

14-11-2019
|

문제

알파 베타 가지 치기로 Minimax를 사용하여 Othello 엔진을 쓰고 있습니다. 그것은 ok 일하고 있지만 다음과 같은 문제가 발견되었습니다 :

알고리즘이 위치가 손실되었음을 알 수있게되면 예상대로 -infinity를 반환하지만 이 경우는 '최고의'움직임을 추적 할 수 없습니다 ... 위치는 이미 잃어버린이지만 어쨌든 유효한 움직임을 반환해야합니다 (바람직하게는 더 오래 살아남은 움직임이 더 오래 살아남을 수있는 움직임은 좋은 체스 엔진으로서 더 오래 살아남을 수 있습니다). 여기에 코드가 있습니다.

private float minimax(OthelloBoard board, OthelloMove best, float alpha, float beta, int depth)
{             
    OthelloMove garbage = new OthelloMove();             
    int currentPlayer = board.getCurrentPlayer();

    if (board.checkEnd())
    {                        
        int bd = board.countDiscs(OthelloBoard.BLACK);
        int wd = board.countDiscs(OthelloBoard.WHITE);

        if ((bd > wd) && currentPlayer == OthelloBoard.BLACK)                
            return INFINITY;
        else if ((bd < wd) && currentPlayer == OthelloBoard.BLACK)                           
            return -INFINITY;            
        else if ((bd > wd) && currentPlayer == OthelloBoard.WHITE)                            
            return -INFINITY;            
        else if ((bd < wd) && currentPlayer == OthelloBoard.WHITE)                            
            return INFINITY;            
        else                             
            return 0.0f;            
    }
    //search until the end? (true during end game phase)
    if (!solveTillEnd )
    {
        if (depth == maxDepth)
            return OthelloHeuristics.eval(currentPlayer, board);
    }

    ArrayList<OthelloMove> moves = board.getAllMoves(currentPlayer);             

    for (OthelloMove mv : moves)
    {                        
        board.makeMove(mv);            
        float score = - minimax(board, garbage, -beta,  -alpha, depth + 1);           
        board.undoMove(mv);             

        if(score > alpha)
        {  
            //Set Best move here
            alpha = score;                
            best.setFlipSquares(mv.getFlipSquares());
            best.setIdx(mv.getIdx());        
            best.setPlayer(mv.getPlayer());                              
        }

        if (alpha >= beta)
            break;                

    }                
    return alpha;
}

나는 그것을 사용하여 그것을 부릅니다 :

AI ai = new AI(board, maxDepth, solveTillEnd);

//create empty (invalid) move to hold best move
OthelloMove bestMove = new OthelloMove();
ai.bestFound = bestMove;
ai.minimax(board, bestMove, -INFINITY, INFINITY, 0);

//dipatch a Thread
 new Thread(ai).start();
//wait for thread to finish

OthelloMove best = ai.bestFound();

손실 된 위치 (예 : 예를 들어 나중에 잃어버린 10 개의 움직임을 상상해보십시오)가 검색되면, 위의 가장 좋은 변수는 인수로 전달 된 빈 잘못된 움직임과 같습니다 ... 왜 ??

도움말셔서!

해결책

Your problem is that you're using -INFINITY and +INFINITY as win/loss scores. You should have scores for win/loss that are higher/lower than any other positional evaluation score, but not equal to your infinity values. This will guarantee that a move will be chosen even in positions that are hopelessly lost.

다른 팁

It's been a long time since i implemented minimax so I might be wrong, but it seems to me that your code, if you encounter a winning or losing move, does not update the best variable (this happens in the (board.checkEnd()) statement at the top of your method).

Also, if you want your algorithm to try to win with as much as possible, or lose with as little as possible if it can't win, I suggest you update your eval function. In a win situation, it should return a large value (larger that any non-win situation), the more you win with the laregr the value. In a lose situation, it should return a large negative value (less than in any non-lose situation), the more you lose by the less the value.

It seems to me (without trying it out) that if you update your eval function that way and skip the check if (board.checkEnd()) altogether, your algorithm should work fine (unless there's other problems with it). Good luck!

If you can detect that a position is truly won or lost, then that implies you are solving the endgame. In this case, your evaluation function should be returning the final score of the game (e.g. 64 for a total victory, 31 for a narrow loss), since this can be calculated accurately, unlike the estimates that you will evaluate in the midgame.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow