When to stop calculating values of each cell in the grid in Reinforcement Learning(dynamic programming) applied on gridworld

datascience.stackexchange https://datascience.stackexchange.com/questions/6700

Question

Considering application of Reinforcement learning(dynamic programming method performing value iteration) on grid world, in each of the iteration, I go through each of the cell of the grid and update its value depending on its present value and the present value of the taking action from that state. Now

  1. How long do I keep updating value of each cell? Shall I keep updating unless the change in the previous and the present value function is the least? I am not able to understand how to implement the stopping mechanism in the grid-world scenario(discount not considered)

  2. Is the value function the values of all the grids in the grid world?
Was it helpful?

Solution

1- You should set a threshold (a hyper-param) that will allow you to quit the loop.

Let V the values for all state s and V' the new values after value iteration.

if $\sum_s|V(s) - V’(s)| \le threshold$, quit

2 - V is a function for every cell in the grid yes because you need to update every cell.

Hope it helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top