When to stop calculating values of each cell in the grid in Reinforcement Learning(dynamic programming) applied on gridworld
-
16-10-2019 - |
Question
Considering application of Reinforcement learning(dynamic programming method performing value iteration) on grid world, in each of the iteration, I go through each of the cell of the grid and update its value depending on its present value and the present value of the taking action from that state. Now
- How long do I keep updating value of each cell? Shall I keep updating unless the change in the previous and the present value function is the least? I am not able to understand how to implement the stopping mechanism in the grid-world scenario(discount not considered)
- Is the value function the values of all the grids in the grid world?
Solution
1- You should set a threshold (a hyper-param) that will allow you to quit the loop.
Let V the values for all state s and V' the new values after value iteration.
if $\sum_s|V(s) - V’(s)| \le threshold$, quit
2 - V is a function for every cell in the grid yes because you need to update every cell.
Hope it helps.
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange