ゼミナール発表

日時: 09月29日(金)4限(15:10-16:40)


会場: L1

司会: 能地 宏
柳田 智也 1651115: M, 2回目発表 知能コミュニケーション 中村 哲, 松本 裕治, Sakriani Sakti
title: Incremental text to speech system for simultaneous speech translation system
abstract: Speech translation system consists of three components: automatic speech recognition (ASR), machine translation (MT), and text to speech synthesis (TTS). In traditional manner, ASR starts after the speaker has spoken the whole sentence, then perform translation and synthesis sentence-by-sentence. Standard TTS requires linguistic information of the full sentence. As spoken speech such lectures can be very long, this method can cause a significant delay. Therefore speech translation can not output speech before speaker finishes utterance.To deal with this task, several studies propose to construct incremental TTS (ITTS), in which the system can synthesize speech, in which the system can synthesis that synthesizes in synthesis units like word. However, the performance is still much lower than standard TTS. Furthermore, there is not yet exist ITTS in Japanese, and the synthesis unit isn't decided. In this talk, I'll decided the synthesis unit of Japanese ITTS and examined its quality.
language of the presentation: Japanese
発表題目: 同時通訳のためのインクリメンタル音声合成システム
発表概要: 音声翻訳システムは、自動音声認識(ASR)、音声翻訳(MT)、テキスト音声合成(TTS)の3要素から構成される。従来の手法において、自動音声認識は、話者が全ての発話を話す前に認識し始め、それから、翻訳と音声合成が文ごとに実行される。通常のTTSは、文全体の言語情報を必要とする。そのため、講義のような口頭発話が長い文章の場合、遅れが発生する。それゆえ、同時通訳は話者が発話を終える前に音声を出力できない。この課題を取り扱うため、複数の研究者がIncremental TTS(ITTS)の実装を提案している。ITTSは単語等の合成単位で合成を行う音声合成である。しかしながら、ITTSの音声の品質はTTSより低い。更に、日本のITTSはまだ実装されておらず、合成単位も決定されていない。そこで、今回の発表では、日本語ITTSの合成単位を決定し、その品質を調査した。
 
寺澤 直人 1651074: M, 2回目発表 知能コミュニケーション 中村 哲, 松本 裕治, 田中 宏季, Sakriani Sakti
title: Tracking Liking State in Brain Activity while Watching Multiple Movies
abstract: Emotion is a valuable information in various applications ranging from human-computer interaction to automated multimedia content delivery. Conventional methods to recognize emotion were based on speech prosody cues, facial expression, and body language. Nevertheless, this information may not appear when people watch a movie. In recent years, some studies have started to use electroencephalogram (EEG) signals in recognizing emotion. However, the EEG data were entirely analyzed in each scene of movies for emotion classification. Thus, the detailed information of emotional state changes cannot be extracted. In this study, we utilize EEG to track affective state during watching multiple movies. Experiments were done by measuring continuous liking state during watching three types of movies, and then constructing subject dependent emotional state tracking model. We used support vector machine (SVM) as a classifier, and support vector regression (SVR) for regression. As a result, the best classification accuracy was 77.6%, and the best regression model achieved 0.645 of correlation coefficient between actual liking state and predicted liking state. These results demonstrate that continuous emotional state can be predicted by our EEG-based method.
language of the presentation: Japanese
 
本田 将大 1651095: M, 2回目発表 知能コミュニケーション 中村 哲, 松本 裕治, 田中 宏季, Sakriani Sakti
title: Detecting Emotional Suppression in the Presence of Disgust by Time Series Change of Cerebral Blood Flow using fNIRS
abstract: Emotion suppression is a form of conscious inhibition of emotional expressive behavior while emotion aroused. In human-to-human interaction, there is a problem that if speakers suppress their emotions, their internal affective states cannot be expressed to the interlocutor. We propose a new recognition technique of emotional suppression that human may not perceive. In this study, in order to detect suppression of negative emotion, we used Functional Near-Infrared Spectroscopy, and measured cerebral blood flow during watching and speaking with neutral and disgust emotions. For the time series data of cerebral blood flow, we performed a statistical test to confirm whether there is a significant difference in the mean value and changes in cerebral blood flow between each emotion. Then, we tried to apply classification models to automatically detect suppressed emotions. Results showed that our models could correctly detect the suppressed emotions over 70\% accuracies.
language of the presentation: Japanese
 
渡部 宏樹 1661021: D, 中間発表 知能コミュニケーション 中村 哲, 松本 裕治, 田中 宏季, Sakriani Sakti
title: Subject-independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response during Speech Perception
abstract: Recent speech perception models propose that neural oscillations in theta band show phase locking to speech envelope to extract syllabic information and rapid temporal information is processed by the corresponding higher frequency band (e.g., low gamma). It is suggested that phase-locked responses to acoustic features show consistent patterns across subjects. Previous magnetoencephalographic (MEG) experiment showed that subject-dependent template matching classification by theta phase patterns could discriminate three English spoken sentences. In this research, we adopt electroencephalography (EEG) to the spoken sentence discrimination on Japanese language, and we investigate the performances in various different settings by using: (1) template matching and support vector machine (SVM) classifiers; (2) subject dependent and independent models; (3) multiple frequency bands including theta, alpha, beta, low gamma, and the combination of all frequency bands. The performances in almost settings were higher than the chance level. While performances of SVM and template matching did not differ, the performance with combination of multiple frequency bands outperformed the one that trained only on single frequency bands. Best accuracies in subject dependent and independent models achieved 55.2% by SVM on the combination of all frequency bands and 44.0% by template matching on the combination of all frequency bands, respectively.
language of the presentation: Japanese
 

会場: L2

司会: 田中 賢一郎
小野 真理子 1651030: M, 2回目発表 生体医用画像 佐藤 嘉伸, 向川 康博, 大竹 義人, 横田 太
title: Enhancement of renal arterial branches in abdominal CT images using Convolutional Neural Network
abstract: Since abdominal blood arteries have complicated branching structures, presenting the structural information of blood artery to a doctor is important for precise diagnosis. Therefore, it is valuable to automatically extract the vascular region from the 3D-CT image with a computer and use it for image diagnosis and treatment planning. In previous work, line enhancement filter has been widely used as a preprocessing for blood artery extraction and blood artery recognition. However, in the conventional processing, oversights are frequently found in thin blood vessels or at the bifurcation of blood vessels and image edges of images has also occurred. In this method, we use a Convolutional Neural Network (CNN). In our experiment, 30 contrast enhanced 3D-CT images were used for training and validation in a leave-one-out cross-validation. The experiment showed that our method has better enhancement accuracy than conventional method.
language of the presentation: Japanese
 
中谷 聡志 1651081: M, 2回目発表 生体医用画像 佐藤 嘉伸, 向川 康博, 大竹 義人, 横田 太
title: Segmentation and estimation of trajectory of surgical tools for laparoscopic images
abstract: Laparoscopic surgery requires a high level of expertise. Therefore, novice surgeons take time to master skills. Technical education for surgery is based on sensory information, and quantitative and objective assessment is difficult. We aim to assess the surgical skill by comparing quantitative metrics such as the tool tip trajectory. In this presentation, we will present about segmentation of surgical tools using convolutional neural networks and trajectory estimation of tips on three dimensions.
language of the presentation: Japanese
 
溝口 拓也 1651102: M, 2回目発表 サイバネティクス・リアリティ工学 清川 清 ☆, 向川 康博, 諏訪 正樹(客員), 井尻 善久(客員)
title: Dot-matrix character extraction method in Factory Automation
abstract: A dot-matrix character is printed or marked by laser and composed of a dot font for product management. If machines can automatically recognize dot characters used from the viewpoint of cost and operational aspects, it will be a great contribution to the industry. In the task of character recognition, it is necessary to generate an input image consists of indivisual characters. In other words, a step of extracting each character from the image individually is indispensable. However, it is difficult to use conventional methods that use character connectivity (SWT) or character width information (MSER) for characters whose dot points are the minimum components of each character. In this research, we propose a top-down method which selects a rough character candidate region from a natural image and gradually narrows down the character candidates based on dot characteristics. We obtain a rough circumscribed rectangle by our own hybrid method using Selective Search and feature point by our own hybrid detection. For the obtained circumscribed rectangle, each superpixel region obtained by applying the SLIC method is classified as either character or non-character regions. We calculate density gradients for the obtained fine character area and then cut out individual characters by analyzing the gradient tendency of both characters and background. In the future study, we will evaluate our method under the condition of actual background operation, where llumination change, shadow and complicated background are involved.
language of the presentation: Japanese
発表題目: FA(フォクトリー・オートメーション) におけるドット文字切り出し手法の提案
発表概要: ドット文字とはドットマトリクスによって構成される文字であり,製品管理などに用いられる. コストや運用面などの観点から利用されるドット文字を,機械が自動で認識することができれば,産業界にとって大きな貢献となる.文字認識のタスクでは,一文字単位での入力画像を生成する必要があり,一般に個別に文字を切り出す工程が発生する. しかし,ドット点が文字の最小構成要素となるドット文字では,文字の連結性(SWT)や文字幅の情報(MSER)を用いた従来の手法を適用させることはできない. 本研究では,そのような特徴を持つドット文字に対して,情景内画像から大まかな文字候補領域を選択し,文字候補を段階的に絞り込むトップダウン型の手法を提案する.文字切り出しの手順としては、まずSelective Searchと特徴点検出をハイブリッドした手法から,大まかな外接矩形を得る.得られた外接矩形に対して,SLIC法を適用し得られた各スーパーピクセルを文字・非文字領域に分割する.最終的には得られた微細な文字領域に対して濃度勾配を求め,文字・背景の勾配傾向を解析することで個別文字を切り出す.今後は,照明変化などが発生する実運用を想定したデータを対象に本手法を適用し,精度を検証する.