コロキアムB発表

日時: 9月14日(月)2限(11:00~12:30)


会場: L1

司会: 進藤 裕之
中山 佐保子 D, 中間発表 知能コミュニケーション 中村 哲, 渡辺 太郎, Sakriani Sakti
title: Multilingual machine speech chain allowing zero-shot code-switching
abstract: Constructing automatic speech recognition (ASR) and text-to-speech (TTS) for code-switching in a supervised fashion poses a challenge since a large amount of code-switching speech and the corresponding transcription are usually unavailable. The machine speech chain mechanism can be utilized to achieve semi-supervised learning. The framework enables ASR and TTS to assist each other when they receive unpaired data since it allows them to infer the missing pair and optimize the models with reconstruction loss. In this study, we handle multiple language pairs of code-switching by integrating language embeddings into the machine speech chain and investigate whether the model can perform with code-switching language pairs that are never explicitly seen during training. Experimental results reveal that the proposed approach improves the performance of the multilingual code-switching language pairs with which the model was trained and can also perform with unknown code-switching language pairs without directly learning on it.
language of the presentation: Japanese
 
東 佑樹 M, 2回目発表 知能コミュニケーション 中村 哲, 渡辺 太郎, 作村 諭一, Sakti Sakriani Watiasri(特任准教授)
title: Automatic Speech Recognition Recovering Missing Words due to Partially Corrupted Speech
abstract: In recent years, Automatic Speech Recognition (ASR) has made remarkable progress and has become a fundamental technology in many systems, such as smart speakers. In real life, speech recognition systems are expected to be used in a variety of situations. In the real environment, the input speech is often interrupted by such as sudden noise and howling of microphone. ASR is sensitive to those external disturbances and the recognition rate may decrease significantly. Therefore, we aim to reduce the recognition error by adding a mechanism to recover the missing words due to partially corrupted speech.
language of the presentation: Japanese
発表題目: 音声の破損により失った文字情報を復元する音声認識
発表概要:近年、自動音声認識(ASR)はめざましい発展を遂げ、スマートスピーカーなど、普段の生活で触れる多くのシステムにおける基盤技術となっている。実生活において、音声認識システムは様々な環境下での使用が想定される。実環境では、急進的な雑音やマイクのハウリングなどにより入力音声がしばしば遮られる。ASRはそのような外部環境の影響を受けやすく、認識率が大きく低下する恐れがある。そこで、本研究では、音声の破損により失われた文字情報を復元する機構を加えることにより、ASRの認識誤りの低減を目指す。
 
国広 有衣子 M, 2回目発表 知能コミュニケーション 中村 哲, 渡辺 太郎, 作村 諭一, Sakti Sakriani Watiasri(特任准教授), 須藤 克仁
title: Tree-To-Speech for Appropriate Prosody of Synthesized Speech
abstract: Human speech includes prosody. The prosody means all speech information which disappears when the speech is written such as intonation, pause, etc.. It represents many features such as emotion, emphasis and also the structure of the sentence. Inappropriate prosody cannot describe the structure or the intent properly, therefore it may cause miscommunication. Recently, Tacotron is known as a speech synthesis system which predicts spectrogram directly from text. However, it may generate inappropriate prosody since it does not consider the structure of the sentence. We aim to synthesize speech with appropriate prosody by using the structure of sentences.
language of the presentation: Japanese
発表題目: 文の構造を考慮した適切な韻律の音声合成
発表概要: 通常、人間の話す音声の韻律には、文の構造などの発話意図に関わる情報が含まれている。 ここでいう韻律とは、イントネーションや息継ぎなど、テキスト化した時に失われる音声情報全般を指す。韻律の不自然な音声は、文意を適切に表すことができず、円滑なコミュニケーションを阻害する恐れがある。 一方、近年注目されている音声合成技術にTacotronがある。Tacotronはテキストから音声のスペクトログラムを予測するが、文の構造は考慮されていないため、不適切な韻律となる可能性がある。 そこで、文の構造を考慮することにより、適切な韻律を持つ合成音声の実現を目指す。
 

会場: L2

司会: SOUFI Mazen
稲葉 光彦 M, 2回目発表 数理情報学 池田 和司☆, 作村 諭一, 川鍋 一晃(客員教授), 森本 淳(客員准教授), 福嶋 誠
title: Decoding the direction of the left and right movements of the same limb from the EEG.
abstract: Brain-machine interface (BMI) is a technology that uses changes in brain activity to operate external equipment. In non-laboratory environments, electroencephalogram (EEG) and near-infrared spectroscopy (NIRS) are the main measurement methods for BMI. Non-invasive BMIs with EEG and NIRS usually employ mental states that do not match intended actions in order to achieve high accuracy. Such lack of intuitiveness in BMI causes worse BMI operability and prolonged training time . As a first challenge towards a more natural BMI , we aim to decode users intended directions of single arm movements. In this talk, I will provide an overview and results of an EEG experiment conducted to investigate the differences in brain activity when imagining moving the same side arm in the left and right directions.
language of the presentation: Japanese
 
酒井 翠 M, 2回目発表 数理情報学 池田 和司, 佐藤 嘉伸, 吉本 潤一郎, 久保 孝富(特任准教授), 福嶋 誠, 日永田智絵
An exploratory study on brain volumetric features of non-melancholic depression
Major depressive disorder ( MDD ), also referred to as depression, is a mental disorder that affects over 300 million people worldwide. According to DSM-5, the symptoms are characterized by a depressed mood that lasts almost all day, reduced interest or pleasure in activities previously enjoyed, fatigue, or energy loss. Depression can be broken into categories depending on the symptoms. Typical depression is known as "melancholic depression." As symptoms, it is often accompanied by having negative feelings such as anger and guilt, long-term depression of mood from various causes. On the other hand, new subtypes of depressive episodes, referred to as "non-melancholic depression," are increasing in recent years, with symptoms different from typical depression. Non-melancholic depression is diagnosed as MDD as well as melancholic depression, but the symptoms are very different from typical depression. In contrast to melancholic depression with persistent depressive symptoms, non-melancholic depression has the disappearance or appearance of symptoms depending on the situation. Moreover, Patients with non-melancholic depression tend to have milder severity than those with melancholic depression. In previous studies comparing healthy control and Melancholic depression patients, it has already been reported that there are differences in some brain regions such as the hippocampus, amygdala, Thalamus. The hippocampus and amygdala are already considered for use as supplementary information to the current symptom-based diagnosis. Therefore, if there is some difference in brain structure between melancholic and non-melancholic depression as well, it is conceivable that the difference could be a biomarker for diagnosing non-melancholic depression. In this study, we hypothesize that melancholic features in depression are correlated with differences in the brain structure, and investigate which brain regions show the significant volumetric difference between melancholic and non-melancholic depression patient groups.
language of the presentation: Japanese
 
平尾 歩 M, 2回目発表 数理情報学 池田 和司, 佐藤 嘉伸, 吉本 潤一郎, 久保 孝富(特任准教授), 福嶋 誠, 日永田智絵
title: Detection of depression relapse based on longitudinal life-log data
abstract: Depression causes loss of quality of life; therefore, we need to consider treatment about depression disorder. Depression is a disease that tends to recur after treatment, and it was reported that early treatment is important in depression because patients with early treatment or intervention have a better prognosis. Some previous studies have used MRI to make predictions based on changes in brain structure, but the large costs and the inevitable time required to get patients to the hospital have made the measurements infrequent. In this study, we aim to predict the relapse of remitted patients suffering from depression based on the life-log. Because, the life-log can measure with a smartphone, which is less expensive than MRI and gets continuous data. Our short-term goals are 1) to find features in life-log for determine relapse and 2) to confirm that we can predict relapse from life-logs. We select the features needed for detection and try to see if they can be predicted from the life-log by like ensemble learning.
language of the presentation: Japanese