Question

I read this

How can I make a AI learn to play a game from zero? A little example, let's say the AI goes to play blackjack, discount all the splits, cards in the deck and so on, the AI could either hit or stand, it doesn't know what it does until of course it starts to lose the game, it should learn that hitting too much make you lose and so does standing too early. I read this is called Reinforcement Learning. But I don't know how to implement it, what modules to use and the like...

Where should I start?

My ultimate goal is to create a kind of game where the user and the AI plays, not one against the other, but both against the game mechanics by themselves[not coop], and both learn playing it. The game would change every once in a while, new mechanics would arise making the game harder for both player and AI. The AI would learn both by playing the game but also by watching the player winning losing. I don't want the computer to learn too quickly tho, I would like to make like both are on the same 'ground'...Perhaps a final level would be the player can play agaisnt the AI. Am I going to the right place or I should try some other approach?

Edit: I thought that would be too broad. So I search a bit about ML and AI, and I found some modules that might help, scikit-learn, PyBrain, neurolab and also RLToolkit. The first two I didn't really understood how to get start it in it, the documentation is very unclear for a newcomer like me, neurolab I haven't tried yet, as I didn't really understood what is a Artificial Neural Networks[ANN] and how it can help me, and the last one, which is more specific to Reinforcement Learning doesn't have any documentation.

Was it helpful?

Solution

While this is not a complete answer, the basic principle goes:

Where the outcome is unpredictable, current state + possible moves = outcome. so, for any given state (in the case of having a certain number/combination of cards, possibly in combination with others having a number of unknown cards, or certain cards having been seen since the last shuffle) of the game, there are a number of possible moves you can do (hit, stand). You would then try either one, and record if that gives you a good or a bad (or somewhere in between) outcome. Next time you see the same current state, you see which possible move gave you the best statistical outcome so far (with a % of randomness).

Where the out

If you have multiple moves, and you don't get an actual result until the end, you would keep a track of all (state+tried move) so far; once you get a result, you apply that to every step along the way.

Once this is done, you get it to play a huge number of games, and it should get better as it goes.

The trick, usually, is to work out what constitutes a "state". The more possible states there are, the more games have to be played before the AI gets good, and the larger your database will be. In blackjack, you might have a state of just the sum of the number of cards (which gives you 20 states), or it might include how many of those are aces (which gives you, I guess, maybe, around about 40 states); it might include how many cards other players have; it might include exactly which values you have in your hand but not the suit (if you have 4 aces, you know noone else has an ace), or might include (pointlessly for Blackjack) the order a suit of the cards.

In some cases, the "state" might be more abstract. For example, in the case of chess, there are many possible "states" to learn them all, and we have to abstract. I don't know what's usually used for this; perhaps what is attacking what and what is defending what, how many squares are covered by how many pieces, which pieces are defended by what, etc.; or

You might also want to consider what constitutes "good" and "bad" outcomes. You might assume that, for blackjack, a win is good, and a loss is bad, and that's all there is to it. However, there is something to be avoided more than losing: making an invalid move. In the example of blackjack, assuming your AI does not know the rules, splitting if you have any hand other than a pair, is something far worse than (possibly) losing. If you count this as a "loss", it would eventually get the hint and stop doing that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top