The transition function in a Markov decision process

https://cs.stackexchange.com/questions/57716

markov-chains
probability-theory

03-11-2019
|

Pregunta

A Markov decision process is typically described as a tuple $\langle A,U,T,R \rangle $ where

$A$ is the state space
$U$ is the action space
$T: A \times U \times A \mapsto [0,\infty) $ is the state transition probability function
$R:A \times U \times A \mapsto \mathbb{R}$ is the reward function

What does this $A \times U \times A$ actually mean in terms of the MDP? It is written in all the papers, but never explained. Does it mean that all the states $a \in A$ are multiplied with all the action $u \in U$? Or something completely different?

No hay solución correcta

Licenciado bajo: CC-BY-SA con atribución

No afiliado a cs.stackexchange