Question

I am trying to implement a Q-learning algorithm for energy optimization. It is a finite MDP with states represented as 6 dimensional vectors of integers. The number of discrete values in each index of the state vector varies from 24 to 90.

The action space varies from state to state and goes up to 300 possible actions in some states, and below 15 possible actions in some states.

If I could make some assumptions (just for the purpose of testing the model), I could reduce the states to about 400 and actions to less than 200.

How can I construct a Q-table for such an environment? I am not sure how to approach this in Python, how to prevent the table containing lots of impossible state/action combinations, or prevent the agent trying to take those unwanted actions.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top