NAIST-IS-MT1051107): Tomoya Mizumoto

Automated Japanese Error Correction with Revision Logs of Language Learning SNS

水本智也 (1051107)

We present an attempt to extract a large-scale Japanese learners' corpus from the revision log of a language learning social network service. This corpus is easy to obtain in large-scale, covers a wide variety of topics and styles, and can be a great source of knowledge for both language learners and instructors. We also demonstrate that the extracted learners' corpus of Japanese as a second language can be used as training data for learners' error correction using a statistical machine translation approach. We evaluate different granularities of tokenization to alleviate the problem of word segmentation errors caused by erroneous input from language learners.