Question

I found this resource that explains q-learning with a very simple example. Make it a 2D problem, a rectangle instead of a line, and it's still simple. The only difference is that now there are 2 more possible actions (up and down).

My question is: if the length and height of the rectangle are random, as well as the starting position and the location of the Treasure, how can the bot apply the knowledge acquired to the new problem? Is there an evolved version of q-learning for problems with dynamical-states?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top