Domanda

This might be more of a math problem, but I couldn't find any relevant document elsewhere.

I just want to figure out which equation is used to calculate alignment score in GIZA++.

Might anyone have an idea?

Thank you for your help in advance.

È stato utile?

Soluzione

If it helps, I found this document, which includes the following description:

Implements full IBM-4 alignment model with a dependency of word classes as described in (Brown et al. 1993)

Following up that reference leads to a paper entitled "The Mathematics of Statistical Machine Translation: Parameter Estimation", which you can find in PDF format here.

The paper gives details of the math underlying the 5 alignment models and is too verbose to paste here. Perhaps you can see if this is sufficiently detailed in its description of Model 4, which is what I assume is used by GIZA++.

There is also this PDF, which summarises the models and training process.

Altri suggerimenti

In short, word alignments and translation probabilities are learned in multiple iterations of Expectation Maximum algorithm.

The "Statistical Machine Translation" of Philip Koehn has a chapter for word alignments. Check statmt.org for more information.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top