NAIST-IS-MT1251126 : Frances Pikyu Yung

ImprovingWord Alignment for Statistical Machine Translation by Prediction of Unalignable Words

Frances Pikyu Yung (1251126)

Word alignment is a common first step in a Statistical Machine Translation (SMT) system, yet professional human translators do not translate word-for-word but sense-for-sense.

The discrepancy results in unalignable words. Based on the contrast of the language pair, certain words can be predicted to be additional to the translation to fulfill senses without a correlated word in the source text. This study analyzes the distribution and linguistic properties of unalignable words based on a hand-aligned parallel corpus of Chinese and English.

Based on the findings, a method is proposed to predict and pre-remove unalignable words prior to automatic word alignment, so as to improve the alignment accuracy. The method improves word alignment accuracy of GIZA++ when evaluated against manual annotation, as well as the end-to-end SMT results.