Question

Could anybody help to explain how to following value function been generated, the problem and solution are attached, I just don't know how the solution is generated. thank you! problem

solution

STILL NEED HELP WITH THIS!!!

Was it helpful?

Solution

Since no one else has taken a stab at it, I'll present my understanding of the problem (disclaimer: I'm not an expert on reinforced learning and I'm posting this as an answer because it's too long to be a comment)

Think of it this way: when starting at, for example, node d, a random walker has a 50% chance to jump to either node e or node a. Each such jump reduces the reward (r) with the multiplier y (gamma in the picture). You continue jumping around until you get to the target node (f in this case), after which you collect the reward r.

If I've understood correctly, the two smaller 3x2 squares represent the expected values of reward when starting from each node. Now, it's obvious why in the first 3x2 square every node has a value of 100: because y = 1, the reward never decreases. You can just keep jumping around until you eventually end up in the reward node, and gather the reward of r=100.

However, in the second 3x2 square, with every jump the reward is decreased with a multiplier of 0.9. So, to get the expected value of reward when starting from square c, you sum together the reward you get from different paths, multiplied by their probabilities. Going from c to f has a chance of 50% and it's 1 jump, so you get r = 0.5*0.9^0*100 = 50. Then there's the path c-b-e-f: 0.5*(1/3)*(1/3)*0.9^2*100 = 4.5. Then there's c-b-c-f: 0.9^2*0.5^2*(1/3)^1*100 = 6.75. You keep going this way until the reward from the path you're examining is insignificantly small, and sum together the rewards from all the paths. This should give you the result of the corresponding node, that is, 50+6.75+4.5+... = 76.

I guess the programmatic way of doing to would be to use a modified dfs/bfs to explore all the paths of length N or less, and sum together the rewards from those paths (N chosen so that 0.9^N is small).

Again, take this with a grain of salt; I'm not an expert on reinforced learning.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top