Question

Im trying to design an openai gym environment that plays a quite simple board game where each player has 16 pieces that are exactly the same in regard to how they can move.

The board is 10x10 and each piece can go UP, DOWN, LEFT, RIGHT, UP_LEFT, UP_RIGHT, DOWN_LEFT, DOWN_RIGHT. They can move in that direction as many fields as pieces are in that line, including the piece that moves. So if I want to go LEFT I count all other pieces to my left AND my right add 1 for myself and then go that many fields to the left. The field may be obstructed though in which case the move is not possible.

So my question is: How could I implement an action space for this? Would be discrete with the (sice of the board) * (how many actions[Up, DOWN...]) suitable? And how can I teach the rl AI (PPO2) that a move is not possible? Should I just give a negative reward and give the same state as before?

I would greatly appreciate help :)

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top