That's called a Bellman Error or a Bellman Residual.
See Williams and Baird, 1993 for use in MDPs.
See Littman, 1994 for use in POMDPs.
Question
Why the termination condition of value-iteration algorithm ( example http://aima-java.googlecode.com/svn/trunk/aima-core/src/main/java/aima/core/probability/mdp/search/ValueIteration.java )
In the MDP (Markov Desicion Process) is
||Ui+1-Ui||< error*(1-gamma)/gamma, where
Ui is vector of utilities
Ui+1 updated vector of utilities
error -error bound used in algorithm
gamma-discount factor used in algorithm
Where does "error*(1-gamma)/gamma" come from? "divided by gamma" is because every step is discounted by gamma? But error*(1-gamma)? And how big must be an error?
Solution
That's called a Bellman Error or a Bellman Residual.
See Williams and Baird, 1993 for use in MDPs.
See Littman, 1994 for use in POMDPs.