When to stop calculating values of each cell in the grid in Reinforcement Learning(dynamic programming) applied on gridworld

https://datascience.stackexchange.com/questions/6700

16-10-2019
|

Question

Considering application of Reinforcement learning(dynamic programming method performing value iteration) on grid world, in each of the iteration, I go through each of the cell of the grid and update its value depending on its present value and the present value of the taking action from that state. Now

How long do I keep updating value of each cell? Shall I keep updating unless the change in the previous and the present value function is the least? I am not able to understand how to implement the stopping mechanism in the grid-world scenario(discount not considered)
Is the value function the values of all the grids in the grid world?

Solution

1- You should set a threshold (a hyper-param) that will allow you to quit the loop.

Let V the values for all state s and V' the new values after value iteration.

if $\sum_s|V(s) - V’(s)| \le threshold$, quit

2 - V is a function for every cell in the grid yes because you need to update every cell.

Hope it helps.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange