Question

I am working on a card game for openai gym and currently I ask myself how to shape the reward function for it. One round of the game consists of each player picking a card from its hand, whereas not every card can be played depending on the card which has been played by one of the players before. For every set of played cards, there is a total order such that the player with the highest card wins the round.
In the situation in which cards are rejected I want to give the agent some reward.

In case of an invalid card, it is hard to say if that card is any nearer to one of the valid cards than any other. Also the agent should learn that this card is not playable at this point.

For completeness, the agent gets a discrete observation of everything it can remember of the game (its own cards, cards played in current round, cards played in past rounds, game mode (defines the total order of cards)). Then it should play a discrete action which either is a game mode in the beginning or a card during the round. Then it either gets a reward because its card got rejected or it gets a reward based on whether it wins the round or not. The game accounts a certain amount of points for a won round depending on the constellation of played cards in that round.

My question is how to shape the rewards for card rejection and for winning a round. Any ideas? Positive or negative?

In case any more details are required, just ask for them.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top