Question

I am trying to use a reinforcement learning solution in an OpenAI Gym environment that has 6 discrete actions with continuous values, e.g. increase parameter 1 with 2.2, decrease parameter 1 with 1.6, decrease parameter 3 with 1 etc.

I have seen in this code that such an action space was implemented as a continuous space where the first value is approximated to discrete values (e.g. 0 if it is < 1 or 2 if it is < 2 and > 1).

Does anybody know if the above solution is the correct way to implement such an action space? Or does Gym offer another way?

Was it helpful?

Solution

Here is a sample environment which demonstrates this. It relies on the environment to successfully filter out the correct continuous control element

import gym
from gym.spaces import Dict, Discrete, Box, Tuple
import numpy as np


class SampleGym(gym.Env):
    def __init__(self, config={}):
        self.config = config
        self.action_space = Tuple((Discrete(2), Box(-10, 10, (2,))))
        self.observation_space = Box(-10, 10, (2, 2))
        self.p_done = config.get("p_done", 0.1)

    def reset(self):
        return self.observation_space.sample()

    def step(self, action):
        chosen_action = action[0]
        cnt_control = action[1][chosen_action]

        if chosen_action == 0:
            reward = cnt_control
        else:
            reward = -cnt_control - 1

        print(f"Action, {chosen_action} continuous ctrl {cnt_control}")
        return (
            self.observation_space.sample(),
            reward,
            bool(np.random.choice([True, False], p=[self.p_done, 1.0 - self.p_done])),
            {},
        )


if __name__ == "__main__":
    env = SampleGym()
    env.reset()
    env.step((1, [-1, 2.1]))  # should say use action 1 with 2.1
    env.step((0, [-1.1, 2.1]))  # should say use action 0 with -1.1
```
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top