Question

I am working on Deep Q-learning applied to Snake, and I am confused on the methodology. Based on the DeepMind paper on the topic and other sources, the Q-value with the Bellman equation needs to be calculated as follows:

Q(s,a) = r + γ(max(Q(s',a'))

While calculating the Q-value for Q-table is straightforward, it's not clear to me the process in Deep Q-learning. According to some sources, the future possible states for the actual state need to be processed with Deep Learning and the highest Q-value needs to be chosen (input: future state, output: Q-value, or input: current state, output: Q-value for future states). Then, a table containing [state, action, reward, future_state] is stored in memory to reduce critical forgetting. I don't understand how we get the Q-values to predict in first place, if we need the states in order to compute the Q-values. Is this approach correct, or I am missing something?

Thank you!

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top