Markov decision process' questions [closed]

https://stackoverflow.com/questions/2148345

23-09-2019
|

문제

alt text http://img693.imageshack.us/img693/724/markov.png

I'm a bit confused about some points here:

What does it mean to say that it will be successful 70% of the time he tries a given action? Does it mean that every time he tries to perform an action A, it will 70% of the time do that action A and the other 30% do the action that leads to the same state, or just that it is as if he always did the action A, but just 30% of the times he just doesn't does it? I hope I am making myself clear :(
How is it possible to have several consecutive states with the same utility? In theory the utility shouldn't always decrease, the farther you are from states with a reward?
Knowing only the info I gave above, is it possible to infer what is the discount factor(gamma)? If yes, how?
Is it possible to calculate the Reward for the states? How?

해결책

문제의 계정에 SQLATIONADMIN (& dbcreator) SQL Server 인스턴스에 서버 역할이 고정되어 있는지 확인하십시오.

자세한 내용은 이 문서 . 를 참조하십시오.

다른 팁

ad.1) probably it is not that robot has always to move -- i.e. those 30% are "ah, now I rest a bit" or "there was no power to move at all".

I've formulated this problem as a Finite-Horizon Markov Decision Process and solved it via Policy Iteration. To the right of each iteration, there is a color-coded grid representation of the recommended actions for each state as well as the original reward grid/matrix.

Review the final policy/strategy at Stage 4. Does it agree with your intuition?

enter image description here

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow