Analysis of Patterns of Complex Sentences for Statistical Machine Translation

To Chinh (1051026)

This study deals with machine translation (MT) of complex sentences from English to Japanese. In order to overcome the problem of the differences in word orders between the source and target languages, we take an approach to divide and rewrite complex sentences into simple clauses for MT, using syntactic information and functional words.

Assuming that there is a little variation in syntactic structure of connectivity in complex sentences, we try to extract a set of high frequency patterns of clauses based on the syntactic information of clauses in complex sentences. We use these patterns to rewrite complex sentences into simple clauses as well as to decide a correct translation in case there are several translations of the pattern in the target language.

We have extracted a set of 100 patterns, which cover 82 percent of all dependent clauses in complex sentences. Also, as dependent clauses are closely related to functional words, which help to define the pattern and the relations among the components of a sentence, the study of complex sentence patterns will help the translation of functional words.