コロキアムB発表

日時: 6月16日(金)3限目(13:30-15:00)


会場: L2

司会: 嶋利 一真
QI HELI M, 2回目発表 知能コミュニケーション 中村 哲, 渡辺 太郎
title: *** Semi-supervised ASR Learning by Synthetic Speech ***
abstract: *** This research presents a novel open-source Pytorch-based approach designed for semi-supervised ASR training via the offline TTS->ASR chain. The TTS->ASR chain, a core component of the machine speech chain, uses synthetic speech generated by TTS models for unspoken text to augment ASR training data. Despite the proven effectiveness of this semi-supervised paradigm, there is a lack of official implementation in many open-source toolkits. To address this, we focus on building an efficient large-scale pipeline that incorporates NAR-TTS models, multi-GPU batch-level model inference, multi-dataloader batch generation, and on-the-fly data filtering. The paper details research on the TTS->ASR chain, including an ablation study on different types of unlabeled data, data filtering thresholds, batch composition, and real-synthetic data ratios. Experimental results on the LibriSpeech and LibriTTS datasets indicate a significant improvement in WER, affirming the utility of our approach in a semi-supervised setting. ***
language of the presentation: *** English ***
 
井手 佑翼 M, 2回目発表 自然言語処理学 渡辺 太郎, 中村 哲, 大内 啓樹
title: Predicting Japanese Lexical Complexity for Non-Native Readers
abstract: Lexical complexity prediction (LCP) is the task of predicting how difficult words in a text are on a continuous scale. It plays a vital role in simplifying or annotating complex words to assist readers. To study lexical complexity in Japanese, we construct the first Japanese LCP dataset. Our dataset provides separate complexity scores for Chinese/Korean annotators and others to address the readers' L1-specific needs. In the baseline experiment, we demonstrate the effectiveness of a BERT-based system for Japanese LCP.
language of the presentation: Japanese
 
的川 雄飛 M, 2回目発表 自然言語処理学 渡辺 太郎, 中村 哲, 大内 啓樹
title: Linking of Notations for Named Entities in Katakana Using Phonetic Similarity to Other Languages
abstract: This research links notations of named entities (NE) in Katakana without context to the appropriate IDs, based on phonetic similarity to other languages. Research on linking NE, entities represented in proper names such as place names or personal names, to the appropriate knowledge to which they correspond, has been conducted for NE which have surrounding context. However, linking of notations of NE without context is also necessary because of application to tasks using words without context like information retrieval. Before realizing it, we convert notations of NE in Katakana in Japanese and notations in other languages into International Phonetic Alphabet (IPA), symbols to describe phones in all languages, respectively, to calculate phonetic similarity between them. Then, based on the similarity, we link notations of NE in katakana without context to the correct IDs in the database to which they belong.
language of the presentation: Japanese
発表題目: 他言語との音声的類似性を用いたカタカナ固有名詞表記のリンキング
発表概要: 本研究は、文脈を伴わないカタカナ固有表現表記を、他の言語との音声的類似性に基づいて適切なIDにリンキングする。地名・人名などの固有名で表される固有表現(NE)をそれが対応する適切な知識にリンキングする研究は、前後の文脈を伴うNEに対して行われてきた。しかし、文脈を伴わない語を用いる情報検索などのタスクへの応用のため、文脈なしのNE表記のリンキングも必要である。本研究では、日本語のカタカナによるNE表記と他言語のNE表記を全言語の音声を記述する記号である国際音声記号(IPA)にそれぞれ変換し、それらの間で計算した音声的類似性に基づいて、カタカナNE表記をそれが属するデータベース中の正しいIDに文脈なしでリンキングする。
 

会場: L3

司会: 平尾 俊貴
HU XIAODAN D, 中間発表 サイバネティクス・リアリティ工学 清川 清, 向川 康博, 内山 英昭, Perusquia Hernandez Monica, 平尾 悠太郎
title: Development and Optimization of Smart Dimming Sunglasses Using a Single-layered Spatial Light Modulator
abstract: We present a smart dimming sunglasses system designed specifically for individuals with photophobia, particularly those who experience sensitivity to light intensity. Our system incorporates a spatial light modulator (SLM) that selectively blocks incoming light through an occlusion mask, which consists of low transmittance areas on the SLM. However, using a single-layered SLM can lead to insufficient blocking due to the issue of out-of-focus, leading to the need for morphological expansion. Additionally, this expansion may result in occlusion leakage, as observed in camera systems. This research aims to address the challenge of achieving a complete occlusion with a single-layered SLM by determining the optimal expansion radius for the occlusion mask, considering both camera system and human vision system.
language of the presentation: English
 
XIA WEI M, 2回目発表 サイバネティクス・リアリティ工学 清川 清, 加藤 博一, 内山 英昭, Perusquia Hernandez Monica, 平尾 悠太郎
title: A Visual Guidance Method Using Head Redirection for Re-Experience of 360-degree Video Viewing
abstract: When watching 360-degree videos using Head-Mounted Displays (HMDs), the free viewing can lead to different visual experiences. In order to offer similar visual experiences across multiple users, without undermining the diversity of user experiences or the individual user's personal experience, we propose a solution that subtly guides users to re-experience the perspective of a prior viewer. This research introduces a method that manipulates the rotation of the virtual scene in accordance with the user's head movement, providing a means of visual guidance within a 360-degree range. By setting the prior viewer's head direction as the guidance target, re-experience can be expected. Experimental results have confirmed that the proposed method successfully aligns the user's head direction closer to that of the prior viewer without interfering with the user's visual experience.
language of the presentation: Japanese
 
WEI XIN M, 2回目発表 サイバネティクス・リアリティ工学 清川 清, 佐藤 嘉伸, 内山 英昭, Perusquia Hernandez Monica, 平尾 悠太郎
title: Unobtrusive Refractive Power Monitoring: Using EOG to Detect Blurred Vision
abstract: The rise in population and aging has led to a significant increase in the number of individuals affected by common causes of vision loss. Early diagnosis and treatment are crucial to avoid the consequences of visual impairment. However, in early stages, many visual problems are making it difficult to detect. Visual adaptation can compensate for several visual deficits with adaptive eye movements. These adaptive eye movements may serve as indicators of vision loss. In this work, we investigate the association between eye movement and blurred vision. By using Electrooculography (EOG) to record eye movements, we propose a new tracking model to identify the deterioration of refractive power. We verify the technical feasibility of this method by designing a blurred vision simulation experiment. Six sets of prescription lenses and a pair of flat lenses were used to create different levels of blurring effects. We analyzed binocular movements through EOG signals and performed a seven-class classification using the ResNet-18 architecture. The results revealed an average classification accuracy of 94.7% in the subject-dependent model. However, the subject-independent model presented poor performance, with the highest accuracy reaching only 34.5%. Therefore, the potential of an EOG-based visual quality monitoring system is proven. Furthermore, our experimental design provides a novel approach to assessing blurred vision.
language of the presentation: English