Do RL agents learn the optimal “degree” of an action to take?
-
02-11-2019 - |
题
I have a game environment I want to train an RL model on. This environment has 2 fundamental actions that the agent can take; "Left" or "Right" (say, 0 or 1).
However, the actions "Left" or "Right" can be taken in a discrete number of "degrees". For example, I can take action "Left" with degree 70% , or take action "Right" with degree 16%.
Assuming a discrete action space between 0-100% for each "Left" or "Right", making the total action space a discrete size of 201 (0-200 in increments of 1), does an agent learn the optimal degree to take either "Left" or "Right" in any given state?
没有正确的解决方案