Calculate Q parameter for Deep Q-Learning applied to videogames
-
31-10-2019 - |
Question
I am working on Deep Q-learning applied to Snake, and I am confused on the methodology. Based on the DeepMind paper on the topic and other sources, the Q-value with the Bellman equation needs to be calculated as follows:
Q(s,a) = r + γ(max(Q(s',a'))
While calculating the Q-value for Q-table is straightforward, it's not clear to me the process in Deep Q-learning. According to some sources, the future possible states for the actual state need to be processed with Deep Learning and the highest Q-value needs to be chosen (input: future state, output: Q-value, or input: current state, output: Q-value for future states). Then, a table containing [state, action, reward, future_state]
is stored in memory to reduce critical forgetting. I don't understand how we get the Q-values to predict in first place, if we need the states in order to compute the Q-values. Is this approach correct, or I am missing something?
Thank you!
No correct solution