Decoding problem and conditional probabilities
-
06-11-2019 - |
Question
I'm reading the book by MacKay "Information theory, inference and learning algorithms" and I'm confused by how he introduces the decoding problem for LDPC codes (page 557).
given a channel output $\mathbf{r}$, we wish to find the codeword $\mathbf{t}$ that whose likelihood $P(\mathbf{r}|\mathbf{t})$ is biggest.
I can get on board with that, even if it seems a bit backwards since $\mathbf{r}$ is the evidence we have, so I would be more comfortable if we were trying to find $\mathbf{t}$ that maximizes $P(\mathbf{t}|\mathbf{r})$, and as a matter of fact, he goes on to say
there are two approaches to the decoding problem both of which lead to the generic problem 'find $\mathbf{x}$ that maximizes $$ P^*(\mathbf{x})=P(\mathbf{x})\mathbb{1}[\mathbf{Hx}=\mathbf{z}]$$
he exposes the two points of view in the following page, the first one especially confuses me
the codeword decoding point of view
First, we note the prior distribution over codewords $$ P(\mathbf{t})=\mathbb{1}[\mathbf{Ht}=\mathbf{0}]$$[...]. The posterior distribution over codewords is given by multiplying the prior by the likelihood [...] $$P(\mathbf{t}|\mathbf{r})\propto P(\mathbf{t})P(\mathbf{r}|\mathbf{t}) $$
from the generic decoding problem he gave, it looks like we're actually trying to maximize $P(\mathbf{x})=P(\mathbf{t}|\mathbf{r})$, instead of $P(\mathbf{r}|\mathbf{t})$.
Is it obvious that the two maxima are the same and with the same maximizer? It is not obvious to me since the maximizer $\mathbf{t_0}$ that maximizes $P(\mathbf{r}|\mathbf{t})$ might actually minimize $P(\mathbf{t})$, so the maximizer of $P(\mathbf{t})P(\mathbf{r}|\mathbf{t})$ might be different! I understand that in this case $P(\mathbf{t})$ is uniform, so this shouldn't be a problem, but it seems weird to me that this is simply not stated and I feel like I'm missing something.
Why should we start with the problem of maximizing $P(\mathbf{r}|\mathbf{t})$ and not $P(\mathbf{t}|\mathbf{r})$? Why does he seem to switch after a few sentences? Am I correct in thinking the algorithm presented actually maximizes $P(\mathbf{t}|\mathbf{r})$?
Thank you
No correct solution