Supplementary Material for
Incorporating Discrete Translation Lexicons into Neural Machine Translation
Philip Arthur, Graham Neubig, Satoshi Nakamura
Nara Institute of Science and Technology - Augmented Human Communication Lab
Carnegie Mellon University - Language Technologies Institute


Neural machine translation (NMT) often makes mistakes in translating low-frequency content words that are essential to understanding the meaning of the sentence. We propose a method to alleviate this problem by augmenting NMT systems with discrete translation lexicons that efficiently encode translations of these low-frequency words. We describe a method to calculate the lexicon probability of the next word in the translation candidate by using the attention vector of the NMT model to select which source word lexical probabilities the model should focus on. We test two methods to combine this probability with the standard NMT probability: (1) using it as a bias, and (2) linear interpolation. Experiments on two corpora show an improvement of 2.0-2.3 BLEU and 0.13-0.44 NIST score, and faster convergence time.

Method + Results

Can be found in this paper.


To reproduce all the experiments you will need to first install:

  • Moses: Phrase based machine translation toolkit.
  • Travatar: Syntax based machine translation decoder for tree-to-string and hierarchical translation.
  • GIZA++: Word aligner based on IBM models.
  • Chainer: Neural network toolkit, written in python3.

After installing all these tools, make sure you link them inside config.ini later.


This is the scripts we use for this paper.

If you have any problems using the scripts, please feel free to contact philip.arthur.om0 (!

The latest version of the decoder can be found in our github repository. The version we used for this paper is v1.0.0.