Вопрос

I'm currently trying to implement this paper, but am struggling to understand some of the math here. I'm pretty sure I understand how to implement the E-step, but for the M-step, I'm confused on how to compute the M-step. It says just before section 3.1 that $p_1(x, z; \theta_1) = p(e)p(a, f|e; \theta_1)$, and then the same for $p_2$ but with $e$ and $f$ swapped. The second part of this makes sense to me, but what is $p(e)$ or $p(f)$? From my understanding, $e, f$ are sentences in the bi-text. So how would we compute the probability of a sentence?

It says earlier that $p(e)$ and $p(f)$ are arbitrary distributions that don't affect the optimization problem, but then how do we compute $p_1(x, z; \theta_1)$?

Thanks!

Нет правильного решения

Лицензировано под: CC-BY-SA с атрибуция
Не связан с datascience.stackexchange
scroll top