I have a custom environment with a multi-discrete action space.

The action and observation spaces are as follows:

Action:

MultiDiscrete([  3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121
 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121
   3 121 121 121   3 121 121 121   3 121 121 121   3 121 121 121   3 121
 121 121   3 121 121 121   3 121 121 121   3 121 121 121])

Observation:

MultiDiscrete([100   3   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121   2 121
 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121
 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121])

I am having an extremely tough time finding an agent (for example in keras-rl) that is capable of handling these spaces.

This issue: https://github.com/keras-rl/keras-rl/issues/224 indicates that the keras-rl DDPG agent is capable of handling a multi-discrete action space, but the model has a float output that I cannot use as an action for the step() function, which expects an integer output!

Most other agents seem to use a tanh activation layer, or some layer that produces a binary output. I need an output in the same shape as my action space.

How can this be handled?

没有正确的解决方案

许可以下: CC-BY-SA归因
scroll top