Question

There is a famous part-of-speech tagging problem in Natural Language Processing. The popular solution is to use Hidden Markov Models.

So that, given the sentence $x_1 \dots x_n$ we want to find the sequence of POS tags $y_1 \dots y_n$ such that $y_1 \dots y_n = \arg\max_{y_1 \dots y_n}p(Y,X)$.

By Bayes Theorem, $P(X,Y)=P(Y)P(X \mid Y)$.

Solving POS by HMM implies the assumptions: $p(y_i \mid y_{i-1})$ and $p(x_i \mid y_i)$.

The question is there are any particular reason why we prefere to solve it by generative model with a lot of assumption and not directly by estimating $P(Y \mid X)$, given the training corpus it's still possible to estimate $p(y_i \mid x_i)$.

The second question, even when we convinced that the generative model is preferred why to calculate is as $P(Y,X)=P(Y)P(X \mid Y)$ and not $P(X,Y)=P(X)P(Y \mid X)$. In case we have an appropriate generative story I can use $P(X,Y)=P(X)P(Y \mid X)$ as well, is it mentioned somewhere that assumed generative story is preferred.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top