How to use MinMax trees with Q-Learning?

https://stackoverflow.com/questions/8947501

18-04-2021
|

Question

I want to implement a Q-Learning connect four agent and heard that adding MinMax trees into it helps.

Solution

Q-learning is a Temporal difference learning algorithm. For every possible state (board), it learns the value of the available actions (moves). However, it is not suitable for use with Minimax, because the Minimax algorithm needs an evaluation function that returns the value of a position, not the value of an action at that position.

However, temporal difference methods can be used to learn such an evaluation function. Most notably, Gerald Tesauro used the TD(λ) ("TD lambda") algorithm to create TD-Gammon, a human-competitive Backgammon playing program. He wrote an article describing the approach, which you can find here.

TD(λ) was later extended to TDLeaf(λ), specifically to better deal with Minimax searches. TDLeaf(λ) has been used, for example, in the chess program KnightCap. You can read about TDLeaf in this paper.

OTHER TIPS

Minimax allows you to look a number of moves into the future and play in a way to maximize your chances of scoring in that timespan. This is good for Connect-4, where a game can end almost at any moment and the number of moves available at each turn is not very large. Q-Learning would provide you with a value function to guide the Minimax search.

Littman has used minimax with Q learning. Hence proposed Minimix-Q learning algorithm in his famous and pioneering work Markov Games as a framework for multiagent reinforcement learning. His work is on zero-sum game in multiagent settings. Later Hu & Wellman extended his work to develop NashQ learning which you can find here.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow