Computer-assisted Japanese Functional Expression Learning for Chinese-speaking Learners

LIU JUN (1661028)


In Japanese language learning, learners should study a large number of vocabulary words as well as various functional expressions. Because a large number of Chinese characters are commonly used both in Japanese and Chinese, Chinese-speaking learners of Japanese as a second language (JSL) find learning Japanese functional expressions more challenging than learning other Japanese vocabulary.

The goal of this thesis is to develop a computer-assisted language learning (CALL) system that is specifically designed for Chinese-speaking learners studying Japanese functional expressions. Given a Japanese sentence as an input, the system automatically detects Japanese functional expressions using a character-based bidirectional long short-term memory with a conditional random field (BiLSTM-CRF) model. Then the whole sentence is segmented and part-of-speech (POS) tagged (word segmentation and POS tagging) by a Japanese morphological analyzer, Mecab, which is trained with a CRF model. In the meanwhile, difficult Japanese functional expressions are simplified with easier Japanese functional expressions or phrases, using a “Simple Japanese Functional Expression Replacement List”. In addition, the system provides JSL learners with appropriate example sentences of Japanese functional expressions. We apply support vector machine for ranking (SVMrank) to evaluate the sentence readability, using Japanese-Chinese common words as an important feature. Furthermore, using k-means clustering algorithm, we cluster the example sentences that contain functional expressions with the same meaning, based on part-of-speech, conjugation form and semantic attribute.

Correcting spelling and grammatical errors of Japanese function expressions shows practical usefulness for JSL learners. However, the collection of these types of error data is fairly difficult. To address this problem, we apply the BiLSTM-CRF model to detect Japanese functional expressions and extract phrases which include Japanese functional expressions as well as their neighboring words from native Japanese and learners’ corpora. Then we generate a large scale of artificial error data via substitution, injection or deletion. We utilize the generated artificial error data to train a character-based sequence-to-sequence (seq2seq) neural machine translation model for error correction on Japanese functional expressions.

We conduct some experiments to show the effectiveness of our proposed method and report on a preliminary user study with Chinese-speaking JSL learners to evaluate the usefulness of our system.