Question

This might be more of a math problem, but I couldn't find any relevant document elsewhere.

I just want to figure out which equation is used to calculate alignment score in GIZA++.

Might anyone have an idea?

Thank you for your help in advance.

Was it helpful?

Solution

If it helps, I found this document, which includes the following description:

Implements full IBM-4 alignment model with a dependency of word classes as described in (Brown et al. 1993)

Following up that reference leads to a paper entitled "The Mathematics of Statistical Machine Translation: Parameter Estimation", which you can find in PDF format here.

The paper gives details of the math underlying the 5 alignment models and is too verbose to paste here. Perhaps you can see if this is sufficiently detailed in its description of Model 4, which is what I assume is used by GIZA++.

There is also this PDF, which summarises the models and training process.

OTHER TIPS

In short, word alignments and translation probabilities are learned in multiple iterations of Expectation Maximum algorithm.

The "Statistical Machine Translation" of Philip Koehn has a chapter for word alignments. Check statmt.org for more information.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top