GU YI | D, 中間発表 | 生体医用画像 | 佐藤 嘉伸, | 加藤 博一, | 大竹 義人, | SOUFI Mazen | |
LIU ZHONGXUE | M, 1回目発表 | サイバネティクス・リアリティ工学 | 清川 清, | 加藤 博一, | 内山 英昭, | Perusquia Hernandez Monica, | 平尾 悠太朗 |
title: Exploring Virtual Pet Agents and Haptic Interaction: Innovative Ways to Relieve Work Stress
abstract: This study explores innovative ways to alleviate work stress through interaction with virtual pet agents and haptic technology. With the increasing prevalence of workplace stress and its adverse effects on mental health, it is critical to find effective stress reduction interventions. Tactile interactions and contact with pets both have a calming effect and can significantly impact stress management. This study compared the stress-reducing effects of interacting with a virtual pet, a robotic pet agent, and a robotic pet equipped with realistic tactile feedback. Additionally, it investigates the role of haptic interaction in improving the effectiveness of virtual pet agents. Groups interacting with virtual pets via virtual reality (VR), robotic pet agents, robotic pets with haptic feedback, haptic-enhanced virtual pets, and a control group without pet interaction. Stress levels were measured before and after the intervention using a subjective questionnaire (PSS-10) and physiological indicators (heart rate variability and galvanic skin response). The research also offers a promising alternative for managing stress in work environments where real pets are not possible. Future research should explore the long-term effects and the potential for integrating such technologies into daily workplace wellness programs. language of the presentation: English | |||||||
AN ZHITING | M, 1回目発表 | サイバネティクス・リアリティ工学 | 清川 清, | 向川 康博, | 内山 英昭, | Perusquia Hernandez Monica, | 平尾 悠太朗 |
髙橋 舜 | D, 中間発表 | ヒューマンAIインタラクション | Sakriani Sakti, | 渡辺 太郎, | 大内 啓樹, | Faisal Mehmood |
title: Modeling Spoken Language in the Quest of Alternative Representation to Text
abstract: The modern language technologies depend on text as data representation of natural languages so heavily, that “natural language processing” is typically associated with text processing. The nature of the natural languages, however, is speech: indeed, about 93 % of the world’s languages are orally-driven, few of them being written or read as standardized, institutional languages like English and Japanese are. As such, the overwhelming number of the natural languages in the world cannot enjoy the conventional language technologies. Against this backdrop, the present research aims to rebuild the foundation of spoken language technologies without ever relying on hitherto manually-crafted text. Rather, it starts from quest for machine-created, that is, unsupervised, alternative representation to text. More specifically, in the present research, machine discovery of text-like linguistic symbols from speech is formulated as deep Bayesian inference. It is revealed that the Bayesian view inevitably leads to joint modeling of fundamental linguistic skills, which respectively correspond to speech recognition and (re)synthesis, and language modeling in the conventional language technologies, resulting in integrated spoken language modeling. The proposed framework lays an extensive groundwork for realizing more advanced "text-free" spoken langauge technologies such as speech translation and dialogue systems that require both the speech recognition and generation capabilities. In this presentation, some promising results are reported that demonstrate the superiority of the proposed framework to the ones previously proposed in the literature in terms of linguistic symbol discovery. language of the presentation: English | ||||||
TRAN QUANG CHUNG | D, 中間発表 | ヒューマンAIインタラクション | Sakriani Sakti, | 渡辺 太郎, | 大内 啓樹, | Faisal Mehmood |
title: Beyond Textual Constraints: Synthesizing Human-Like Speech from Diverse Modalities, Including Text, Images, and Brain Signals
abstract: Speech synthesis technology enables machines to learn how to speak and can be used in various applications, such as voice assistants and educational tools. Modern speech synthesis, based on deep learning, has significantly advanced in producing more natural and human-like speech. However, the current speech synthesis framework, known as "text-to-speech" (TTS), relies solely on input text to generate speech, limiting its usability to those who can read and write. Additionally, deep learning-based TTS systems require a large amount of paired text-speech data, while almost half of the world's languages do not have a written form. On the other hand, human speech is not confined to the written word; it draws from a complex interaction of thoughts, sensory inputs, visual cues, and auditory signals. Inspired by this observation, this research proposes a speech synthesis system capable of generating speech from different modalities, including text, images, and brain signals. language of the presentation: English | ||||||
古川 慧 | D, 中間発表 | ヒューマンAIインタラクション | Sakriani Sakti, | 渡辺 太郎, | 須藤 克仁 | |
title: Boundary-Driven Account for Downstep in Japanese and Applying Boundary-Driven Theory to Neural Sequence-to-Sequence Speech Synthesis
abstract: This study presents a novel approach to Japanese speech synthesis by applying the syntax--prosody mapping hypothesis and boundary-driven theory from linguistics. Focusing on the phonological phenomena of initial lowering and rhythmic boost, our research introduces a phonological model that significantly outperforms traditional methods in both objective and subjective evaluation experiments. This study proposes new objective evaluation criteria for Japanese speech synthesis. These criteria offer a more rigorous and linguistically grounded methodology for assessing the quality of synthesized speech. The phonological model’s ability to accurately reproduce initial lowering, a common phenomenon in speech, highlights its superior capability in reflecting syntactic structure variations through intonation. Additionally, the model demonstrates a unique proficiency in generalizing and reproducing the rhythmic boost phenomenon, despite its absence in the training data. This ability underscores the importance of learning phonological boundaries in speech synthesis. Our approach not only yields more natural-sounding speech but also enriches the field by incorporating complex linguistic theories in the computational process. This research thus marks a significant advance in the naturalness and linguistic accuracy of speech synthesis, with broader implications for computational linguistics and artificial intelligence. language of the presentation: English | ||||||