Q-learning when minimising a total cost instead of maximising a total reward
-
02-11-2019 - |
Pregunta
I have a decision problem where the results are measured as a cost that I want to minimise. It seems like a good fit to Q-learning, but I am not sure how to adjust it to deal with a cost instead of a reward.
Which one is better:
- Initializing Q-values for all actions with zeros, then getting the agent to learn the actions that maximize the Q-values, and later filter out the actions with minimum Q-values. The Q-values update would then be:
q_dict['state1']['act1'] +=
r + (max([q_dict['state2'][u] for u in q_dict['state2']]))
- Initializing Q-values with a big number then getting the agent to learn actions that minimize the Q-values, and later filtering out the actions with minimum Q-values. The Q-values update would then be:
q_dict['state1']['act1'] -=
r + (max([q_dict['state2'][u] for u in q_dict['state2']]))
No hay solución correcta
Licenciado bajo: CC-BY-SA con atribución
No afiliado a datascience.stackexchange