コロキアムB発表

日時: 9月19日(水)1限(9:20~10:50)


会場: L1

司会: 進藤 裕之
中山 佐保子 M, 2回目発表 知能コミュニケーション 中村 哲, 松本 裕治, Sakriani Sakti
title: Speech Chain for Semi-supervised learning of Japanese-English Code-switching ASR and TTS
abstract: Code-switching (CS) speech, in which speakers alternate between two or more languages in the same utterance, often occurs in multilingual communities. Such a phenomenon poses challenges for spoken language technologies: automatic speech recognition (ASR) and text-to-speech synthesis (TTS), since the systems need to be able to handle the input in a multilingual setting. We may find code-switching text or code-switching speech in social media, but parallel text and speech of code-switching data suitable for training ASR and TTS are mostly unavailable. Furthermore, most existing approaches, developed for bilingual code-switching, either mainly focused on supervised learning with CS data only for ASR or only for TTS. In contrast, our study constructs sequence-to-sequence models for both Japanese-English code-switching ASR and TTS that are jointly trained through a loop connection. We utilize a speech chain framework based on deep learning to enable ASR and TTS to learn code-switching in a semi-supervised fashion. We first separately train ASR and TTS systems with the parallel speech-text of monolingual Japanese and English data (supervised learning) that might resemble what students of multiple languages learn in school. After that, we perform a speech chain with only code-switching text or code-switching speech (unsupervised learning) that imitates how humans simultaneously listen and speak in a CS context in a multilingual environment. Experimental results reveal that such closed-loop architecture allows ASR and TTS to learn from each other and improve the performance even without any parallel code-switching data.
language of the presentation: Japanese
 
西村 優汰 M, 2回目発表 知能コミュニケーション 中村 哲, 松本 裕治, 須藤 克仁, Graham Neubig(客員)
title: Multi-Source Neural Machine Translation with Missing Data
abstract: Multi-source translation is an approach to exploit multiple inputs (e.g. in two different languages) to increase translation accuracy. I examine approaches for multi-source neural machine translation (NMT) using an incomplete multilingual corpus in which some translations are missing. In practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages (for example, in TED talks, most English talks only have subtitles for a small portion of the languages that TED supports).
language of the presentation: Japanese