岡本 真由子 | M, 2回目発表 | 知能コミュニケーション | 中村 哲, 杉本 謙二, 作村 諭一(BS), Sakriani Sakti |
title: Phoneme-Level Speaking Rate Variation on Waveform Generation using GAN-TTS
abstract: The development of text-to-speech synthesis (TTS) systems continues to advance, and the naturalness of their generated speech has significantly improved. But most TTS systems now learn from data using a deep learning framework and generate the output at a monotonous speaking rate. In contrast, humans vary their speaking rates and tend to slow down to emphasize words to distinguish elements of focus in an utterance. Unfortunately, recording natural speech with various speaking rates is expensive and time-consuming. This paper constructs synthetic and natural speech corpora with a variable speaking rate and analyzes the main difference in the speaking rates of natural and artificial data. We develop a generative adversarial network (GAN) based TTS that enables waveform generation with phoneme-level speaking rate variations. language of the presentation: Japanese | |||
本村 駿乃介 | M, 2回目発表 | 知能コミュニケーション | 中村 哲, 杉本 謙二, 国田 勝行(BS), 田中 宏季 |
title: EEG-based Detecting Semantic Anomalies in Spoken Sentences with Sequential Attention Models
abstract: We propose a method with attention-based recurrent neural networks (ARNN) for detecting semantic anomalies in spoken sentences using single-trials' electroencephalogram (EEG) signals. 17 participants listened to sentences, some of which included semantically anomalous words, and answered the correctness of the sentences by pressing a button. During the procedure, we recorded EEG signals of them. We used EEG signals of whole regions of each sentence, which makes it possible to classify the correctness of sentences without the information of onsets of the anomalous words. As a result, ARNN achieved 63.5% classification accuracy with statistical significance above the chance level. Attention weights of the model showed that the predictions depended on the feature vectors temporally close to the onsets of the anomalous words. language of the presentation: Japanese 発表題目: 時系列注意機構モデルによる脳波を用いた音声文の意味誤り検出 発表概要: 音声文中の意味誤りを検出するため,単一試行脳波信号を用いた注意機構付きリカレントニューラルネットワーク(ARNN)による手法を提案する.文中の意味誤りに対する脳波の反応が知られているが,単一試行脳波での検出に関する報告は少ない.17名の実験協力者が一部の文に意味誤り単語を含む文を音声で聞き,その正誤をキーボードのボタンにより応答を行った際の脳波信号を収録した.文全体の脳波信号を用いることで,誤り単語のオンセット情報を利用せずに文の正誤を分類することを可能にした.脳波の生信号を入力したARNNにより,63.5%の有意な分類精度が示された.注意機構により,誤り単語のオンセット付近に重みが付き分類が行われている事例も確認した. | |||
FITO WIGUNANTO HERMINAWAN | M, 2回目発表 | ネットワークシステム学 | 岡田 実, 杉本 謙二, 東野 武史, Dong Duong Thang |
title: Investigation of Cross Modulation for SC-FDMA Signals in Radio over Fiber Mobile Link
abstract: This research presents the investigation performance of uplink system in radio over fiber. Block diagram have been illustrated and the results have been obtained by theoretical and simulation. In uplink system, SC-FDMA has been recommend by 3GPP LTE, where the receiver site of mobile link need to improved performance the mobile terminal without any changes in LTE frame. Analog RoF is is used to supply the needs, which have the risk of cross modulation in transmission signal. RoF channel using Mach-Zehnder modulator where the magnitude of the distortion depends on Normalized Optical Modulation Index language of the presentation: English | |||