Markov decision process' questions [closed]
-
23-09-2019 - |
문제
alt text http://img693.imageshack.us/img693/724/markov.png
I'm a bit confused about some points here:
- What does it mean to say that it will be successful 70% of the time he tries a given action? Does it mean that every time he tries to perform an action A, it will 70% of the time do that action A and the other 30% do the action that leads to the same state, or just that it is as if he always did the action A, but just 30% of the times he just doesn't does it? I hope I am making myself clear :(
- How is it possible to have several consecutive states with the same utility? In theory the utility shouldn't always decrease, the farther you are from states with a reward?
- Knowing only the info I gave above, is it possible to infer what is the discount factor(gamma)? If yes, how?
- Is it possible to calculate the Reward for the states? How?
해결책
문제의 계정에 SQLATIONADMIN (& dbcreator) SQL Server 인스턴스에 서버 역할이 고정되어 있는지 확인하십시오.
자세한 내용은 이 문서 . 를 참조하십시오.
다른 팁
ad.1) probably it is not that robot has always to move -- i.e. those 30% are "ah, now I rest a bit" or "there was no power to move at all".
I've formulated this problem as a Finite-Horizon Markov Decision Process and solved it via Policy Iteration. To the right of each iteration, there is a color-coded grid representation of the recommended actions for each state as well as the original reward grid/matrix.
Review the final policy/strategy at Stage 4. Does it agree with your intuition?
제휴하지 않습니다 StackOverflow