Pregunta

I'm attempting to pose a problem as a reinforcement learning problem. My difficulty is that the state which an agent is in changes randomly. They must simply choose an action within the state they are in. I want to learn appropriate actions for all states based on the reward they receive for performing actions.

Question:

Is this a specific type of RL problem? If there is no successor state, so how would one calculate the value of a state?

¿Fue útil?

Solución

If the state really changes randomly, if there is no relationship between the action and the following state, then all you can do is record and average the rewards for each action and each state.

Otros consejos

So I've discovered that this would be called a Monte Carlo reinforcement learning problem. Rather than associating value with a state based on the value of the states one can transition to, value is associated with a state according to the outcome of a policy given that state directly. This is useful for instances when the dynamics of the state transition function are unknown or highly stochastic and difficult to model.

https://en.wikipedia.org/wiki/Reinforcement_learning

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top