문제

alt text http://img693.imageshack.us/img693/724/markov.png

I'm a bit confused about some points here:

  1. What does it mean to say that it will be successful 70% of the time he tries a given action? Does it mean that every time he tries to perform an action A, it will 70% of the time do that action A and the other 30% do the action that leads to the same state, or just that it is as if he always did the action A, but just 30% of the times he just doesn't does it? I hope I am making myself clear :(
  2. How is it possible to have several consecutive states with the same utility? In theory the utility shouldn't always decrease, the farther you are from states with a reward?
  3. Knowing only the info I gave above, is it possible to infer what is the discount factor(gamma)? If yes, how?
  4. Is it possible to calculate the Reward for the states? How?
도움이 되었습니까?

해결책

문제의 계정에 SQLATIONADMIN (& dbcreator) SQL Server 인스턴스에 서버 역할이 고정되어 있는지 확인하십시오.

자세한 내용은 이 문서 . 를 참조하십시오.

다른 팁

ad.1) probably it is not that robot has always to move -- i.e. those 30% are "ah, now I rest a bit" or "there was no power to move at all".

I've formulated this problem as a Finite-Horizon Markov Decision Process and solved it via Policy Iteration. To the right of each iteration, there is a color-coded grid representation of the recommended actions for each state as well as the original reward grid/matrix.

Review the final policy/strategy at Stage 4. Does it agree with your intuition?

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top