Matrix notation in Sutton and Barto
-
14-12-2020 - |
Pregunta
On pg. 206 of Barto and Sutton's Reinforcement Learning, there is a curious statement about the result of a scalar product:
As I interpret it, A is the expectation of a scalar product of two d-dimensional vectors: which should be a scalar, right? So how do they get a dxd-matrix from it? Is it a shorthand for a scalar matrix (diagonal with the repeated coefficient, namely this scalar product)?
Solución
In Sutton & Barto, vectors are considered column vectors by default. So if you have this kind of product:
$$\mathbf{a}\mathbf{b}^T$$
where $\mathbf{a}$ and $\mathbf{b}$ are $d$ dimensional vectors, it does not calculate the scalar product. Instead it treats both vectors as matrices and calculates a matrix product, which will be a $d \times d$ matrix because you are multiplying a $d \times 1$ matrix by a $1 \times d$ matrix.
Worthing noting that the scalar product can also be calculated as a $1 \times 1$ matrix if follow the same matrix multiplication rules but with the first vector transposed instead:
$$\mathbf{a}^T\mathbf{b}$$
which leads to multiplying a $1 \times d$ matrix by a $d \times 1$ matrix. This is why the value function approximation can be written as $\mathbf{w}^T\mathbf{x}_t$ (there is a small liberty taken of assuming a $1 \times 1$ matrix is the same as a scalar value in terms of notation).