Question

I try to understand the details regarding using Hidden Markov Model in Tagging Problem.

The best concise description that I found is the Course notes by Michal Collins.

The goal is to find a function $f(x)=arg max_{y \in Y} p(y|x)$, where $y$ is the tag set for sentence $x$.

Question 1. It's suggested to use a generative model and to estimate joint probability $p(x,y)$ from the trainig examples, however what the the reason to use generative model and increase the number of computation why not directly to estimate $p(y|x)$, I think it's possible to estimate the conditional probability straightforward from the training data.

Addendum. Do you know the reason why at all we should try to use a generative model in this case (POS tagging). As I understand if we can estimate $p(x,y)$ that exactly with the same success we can estimate $p(y|x)$ and directly find the answer to the question, what is the best tagging - $\hat{y}$ without weak assumption of generative model. There is the reason to use generative model, and I don't see it yet. Can you explain me what the reason?

Question 2. Assume we decided to use a generative model and made estimation to $p(x,y)$ why we decide to decompose it as follows $p(x,y)=p(y)p(x|y)$ and not $p(x,y)=p(x)p(y|x)$?

Addendum. I do understand that it's very logical to use the decomposition $p(y)p(x|y)$ just because by doing it we approach $p(y|x)$, so mathematically it seems very reasonable, however according to the task I don't see what the problem to decompose it like $p(x,y)=p(x)p(y|x)$, there should be sore reason why we can not decompose it so and I don't understand why.

I appreciate your help.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top