Question

On pg. 206 of Barto and Sutton's Reinforcement Learning, there is a curious statement about the result of a scalar product:

enter image description here

As I interpret it, A is the expectation of a scalar product of two d-dimensional vectors: which should be a scalar, right? So how do they get a dxd-matrix from it? Is it a shorthand for a scalar matrix (diagonal with the repeated coefficient, namely this scalar product)?

Was it helpful?

Solution

In Sutton & Barto, vectors are considered column vectors by default. So if you have this kind of product:

$$\mathbf{a}\mathbf{b}^T$$

where $\mathbf{a}$ and $\mathbf{b}$ are $d$ dimensional vectors, it does not calculate the scalar product. Instead it treats both vectors as matrices and calculates a matrix product, which will be a $d \times d$ matrix because you are multiplying a $d \times 1$ matrix by a $1 \times d$ matrix.

Worthing noting that the scalar product can also be calculated as a $1 \times 1$ matrix if follow the same matrix multiplication rules but with the first vector transposed instead:

$$\mathbf{a}^T\mathbf{b}$$

which leads to multiplying a $1 \times d$ matrix by a $d \times 1$ matrix. This is why the value function approximation can be written as $\mathbf{w}^T\mathbf{x}_t$ (there is a small liberty taken of assuming a $1 \times 1$ matrix is the same as a scalar value in terms of notation).

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top