コロキアムB発表

日時: 07月22日（Tue） 2限目（11:00-12:30）

会場: L1

司会: 鶴峯義久

中川　郁仁	M, 2回目発表	数理情報学	池田　和司,	田中　沙織,	久保　孝富,	日永田　智絵,	LI YUZHE
title: Analysis of the effects of emotions, visual stimuli, and physiological responses on subjective taste evaluations of beverages abstract: Taste is an important sense because it directly influences food selection and consumption, which in turn greatly affects nutritional status and health. However, the relative effects of emotions, visual stimuli, and physiological responses on taste are not yet fully understood. Therefore, this study analyzed the effects of human emotions, visual stimuli based on color, and physiological responses on the subjective taste of beverages. This study aims to provide insights that contribute to the design of sustainable dietary patterns that support health by elucidating the mechanisms underlying taste changes. language of the presentation: Japanese 発表題目:感情・視覚刺激・生理反応が飲料の主観的味覚評価に与える影響の解析発表概要：味覚は食物の選択や摂取に直接影響を与え，栄養状態や健康に大きく影響を与えるため重要な感覚である.しかし，味覚に対する感情や視覚刺激，生理反応の相対的な効果についてはまだ十分に明らかになっていない.そこで，本研究ではヒトの感情および色による視覚刺激，生理反応が飲料の主観的な味覚に及ぼす影響を解析した. 本研究は，味覚の変化メカニズムの理解を通じて，健康を支える持続的な食生活設計に資する知見を提供する.

廣中　高太郎	M, 2回目発表	数理情報学	池田　和司,	田中　沙織,	久保　孝富,	日永田　智絵,	LI YUZHE
title: Analysis of local field potentials in the dorsal premotor cortex of monkeys during a shape manipulation task: comparison with the lateral prefrontal cortex abstract: abstract: The lateral prefrontal cortex (lPFC) and dorsal premotor cortex (PMd) are closely related anatomically and play crucial roles in behavioral planning and preparation based on cognitive information. However, the modulation of local field potentials (LFPs) concerning action selection remains to be elucidated. Here, we investigated task-related LFP modulations in the lPFC and PMd while monkeys performed a shape manipulation task that required stepwise actions based on visual shape perception. language of the presentation: English

西谷　香紀	M, 2回目発表	脳・行動モデリング	田中　沙織,	池田和司,	久保　孝富,	荻島　大凱
title:The Effect of Psychological Distance on Pain Avoidance Learning abstract:Empathy for others' pain is essential in social life, but its degree depends on psychological distance. This study investigates how increased psychological distance affects learning processes related to avoiding others' pain, within the framework of model-free and model-based learning. Previous studies suggest that learning to avoid others' pain is more model-free than learning to avoid one's own pain. However, this interpretation faces theoretical challenges. We hypothesize that increased psychological distance reduces information gain, thereby increasing model-free behavior. Through behavioral experiments, we examine whether this shift toward model-free behavior in response to others' pain truly occurs and how empathy influenced by psychological distance modulates decision-making processes. language of the presentation: * Japanese *

日垣　輝大	D, 中間発表	光メディアインタフェース	向川　康博,	清川　清,	舩冨　卓哉,	藤村　友貴,	北野　和哉
title: Refractive Epitrochoids Sampling for Light FieldMeasurement using Wedge Prisms abstract: Conventional light field measurement using a micro-lens array suffers from grid-like artifacts in the refocused image because the sub-apertur images are sampled at linear and equal intervals like a grid pattern. In this study, we propose a method for measuring light field using a pair of wedge prisms and a method for calibrating the sub-aperture position. The epitrochoids trajectory of rays generated by rotating prisms yields unequal intervals and variable sampling of the sub-aperture position. We designed and calibrated the optical system, and verified the effectiveness of the proposed method on refocused images using the measured light field. language of the presentation: Japanese 発表題目:一対のウェッジスプリズムを用いた光線空間計測発表概要: 従来の光線空間計測では，マイクロレンズ光学系の制約によるサンプリング数の固定や，部分開口画像が格子状にサンプリングされる事で，画像処理時に不自然なボケが生じる問題がある．本研究では，1対のウェッジプリズムを用いた屈折型光学系による不等間隔な光線空間の計測手法と，視点位置のキャリブレーション手法を提案する．プリズムの回転によって生じる光線のリサージュの軌跡に沿って部分開口画像を取得する事により，可変かつ，不等間隔な視点位置のサンプリングを実現する．シミュレーションによる定量評価及び，光学系の設計・キャリブレーションを行い，計測した光線空間を用いたリフォーカス処理を通じて，本手法の有効性を検証した

日時: 07月22日（Tue） 2限目（11:00-12:30）

会場: L2

司会: 嶋利一真

西田　悠人	D, 中間発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛
title: Towards Understanding the Performance Limitations of Language Models abstract: Understanding the performance limitations of language models is essential for their appropriate use and, in the long run, for improving the models themselves. In this study, we focus on kNN-LM, a language model that utilizes external memory, and demonstrate that although it is expected to improve the prediction of low-frequency words, it in fact often degrades performance. We identify the cause of this limitation as the sparse and entangled distribution of low-frequency word embeddings in the representation space, which leads to reduced retrieval performance. language of the presentation: Japanese 発表題目: 言語モデルの性能制約の解明に向けて発表概要: 言語モデルの性能にかかる制約を明らかにすることは、言語モデルの適正な利用や、ひいては言語モデル自体の改善につながる。本研究では、外部メモリを活用する言語モデルであるkNN-LMにおいて、低頻度語の予測改善が期待されるにもかかわらず、実際にはむしろ性能を毀損することを示す。その要因として、低頻度語の埋め込みが空間内で疎にかつ混交して分布しているため検索精度が低下していることを明らかにした。

ZHOU WANGZIXI	M, 2回目発表	ヒューマンＡＩインタラクション	Sakriani Sakti,	渡辺　太郎,	大内　啓樹,	Faisal Mehmood,	Bagus Tris Atmaja
title: Adaptive Personalized Emotional TTS through Human-in-the-Loop Learning of Individual Emotion Perception Spaces abstract: Currently, existing Emotional Text-to-Speech primarily focus on discrete emotion labels, failing to capture the subtle, continuous nuances of human emotion. In contrast, utilizing emotional dimensions presents a more promising approach, offer finer control. However, human emotion perception is highly subjective and personalized, making generalized models cannot satisfy everyone's unique preferences. In this work, We propose a personalized Emotional TTS. We integrate human-in-the-loop feedback mechanism to adapt the model to individual users' unique emotional perceptions, thereby optimizing a personalized Russell's Model for each user language of the presentation: English

JAN MEYER SARAGIH	M, 2回目発表	ヒューマンＡＩインタラクション	Sakriani Sakti,	渡辺　太郎,	大内　啓樹,	Faisal Mehmood,	Bagus Tris Atmaja
title: Improving Speech Synchronization in Automatic Dubbing through Diverse Candidates Generation abstract: Dubbing is a process of overlaying original audio track with another audio track spoken in another language. Recent advances in this topic has proposed modification of machine translation to also consider source speech duration during translation and outputs both translation and its durations. While this approach improves the speech synchronization, additional target in machine translation results in reduced translation quality. We introduce a framework that leverages N-best machine translation outputs and paraphrasing with LLM to generate variety of translation candidates candidates. These candidates are used to generate speech as close as possible to the source audio, so that it could achieve high speech synchronization without having to sacrifice a lot of translation quality and naturalness. language of the presentation: English

迫田　正太	M, 2回目発表	サイバネティクス・リアリティ工学	清川　清,	Sakriani Sakti,	内山　英昭,	Perusquia Hernandez Monica,	平尾　悠太朗
title: Development of Character-Driven Multimodal Text-to-Speech Synthesis Method abstract: In creative fields such as video game or anime production, character design often precedes voice selection. However, generating suitable voices using existing text-to-speech (TTS) systems is labor-intensive, requiring the collection of target speaker data, repeated fine-tuning, and careful adjustment for emotional expression. This research proposes a character-driven multimodal TTS method that uses character face images and motion information to automatically predict speaker traits and prosody, enabling natural and expressive speech synthesis. The proposed model is built with a multimodal architecture incorporating CLIP-based encoders and is trained on a custom dataset consisting of face images, voice samples, and text. Objective and subjective evaluations are conducted to assess speaker similarity, speech naturalness, and compatibility between the synthesized voice and the character. language of the presentation:Japanese 発表題目:キャラクター駆動型マルチモーダル音声合成手法の開発発表概要:創作活動においてはキャラクター設計が音声よりも先行する場合が多く，既存の音声合成技術では，キャラクター性に合った音声を生成するには多大な手間がかかる．本研究では，キャラクターの顔画像および発話時の動作情報を用いて，話者性と韻律スタイルを自動的に予測し，自然でキャラクターに適した音声を合成可能なキャラクター駆動型マルチモーダル音声合成手法を提案する．提案手法は，CLIPベースのマルチモーダルエンコーダを用いて顔画像・動作・音声情報を統合し，話者性とスタイルを制御するTTSモデルを構成する．顔画像・音声・テキストからなる独自データセットを構築し，客観評価・主観評価を通じて，話者再現性，音声の自然さ，キャラクターとの親和性を検証する．

日時: 07月22日（Tue） 2限目（11:00-12:30）

会場: L3

司会: Ahmad Kamal Nasution

酒井　ちひろ	M, 2回目発表	サイバネティクス・リアリティ工学	清川　清,	荒牧　英治,	内山　英昭,	Perusquia Hernandez Monica,	平尾　悠太朗
title: Development of a Robotic System for Individuals with Physical Disabilities to Support Gaze Visualization and Visual Exploration abstract: The objective of this study is to develop a robotic system for individuals with severe motor impairments that delivers gaze visualization and visual exploration support. Such individuals often experience speech difficulties and restricted visual fields due to nerve damage or muscle weakness, which limits the expression of intent and points of interest and makes environmental awareness challenging, thereby increasing risks to personal safety. The proposed system equips users with the ability to freely inspect areas of interest in their surroundings and to visualize gaze information as a form of pseudo-pointing that conveys points of attention to others. Adopting an inclusive development process that involves both users and caregivers, the study iteratively integrates feedback to refine design requirements for gaze visualization, visual exploration support, the robot’s exterior, and the graphical user interface (GUI) on a personal computer. language of the presentation:Japanese 発表題目:視線可視化と視覚的探索支援を実現する肢体不自由者向けロボットシステムの開発発表概要:本研究の目的は，肢体不自由者を対象としたロボットシステムを開発し，視線可視化および視覚的探索支援を実現することである．肢体不自由者は，神経損傷や筋力低下に起因する発声困難や視野制限などの症状によって，意思や注目対象の表出が制限され，環境認識も困難となるため，身の安全が脅かされる可能性が高い．提案システムは，ユーザが周囲の関心領域を自在に確認できる機能と，視線情報を可視化して周囲へ疑似的な指差しを提示する機能を備える．本研究では，肢体不自由者と介助者を巻き込んだインクルーシブな開発プロセスを採用し，開発と当事者からのフィードバックを反復することで，視線可視化・視覚的探索支援，ロボット外装，PC上のグラフィカルユーザインタフェース（GUI）に関する設計要件を明確化することを目指す．

清水　美緒奈	M, 2回目発表	ソーシャル・コンピューティング	荒牧　英治,	清川　清,	若宮　翔子,	Peng Shaowen
title: Landmark Extraction for Generating Clarification Questions to Resolve POI Ambiguity abstract: Geographic ambiguity in natural language place expressions presents a persistent challenge in geocoding. While prior work has leveraged contextual cues to infer intended locations, such information is often insufficient for precise disambiguation. To address this, we propose an interactive framework that resolves ambiguity between candidate Points of Interest (POIs) by generating clarification questions based on spatially discriminative landmarks. Our method uses a vision-language model (VLM) to analyze 360-degree street view imagery around each POI and generate textual clues that highlight visual features unique to one location. We compiled a dataset of ambiguous POI pairs across diverse regions and categories in Japan and conducted a crowdsourced evaluation of the generated clues. Results show that disambiguation effectiveness varies by region, with dense urban areas—where visual environments are more homogeneous—posing greater challenges for clue generation. language of the presentation: Japanese 発表題目: 場所表現の曖昧性解消に向けた質問内容生成のためのランドマーク抽出発表概要: 自然言語における場所表現の地理的曖昧性は，ジオコーディングにおいて重要な課題である．従来の研究では，場所表現が登場する文脈情報の有用性が示されている一方で，文脈情報を考慮しても位置が特定できない場合も少なくない．そこで本研究では，曖昧な場所表現の位置を特定するためのユーザへの質問内容生成を目的として，候補となる複数のPoint of Interest (POI) 間の識別に有用なランドマークの抽出を行う．具体的には，各POI候補周辺の360°Street View画像をVision-Language Models (VLM) により分析し，特定の地点に固有な視覚的特徴を記述したテキスト手がかりを生成する．日本国内の多様な地域・カテゴリにまたがる曖昧なPOIペアのデータセットを用いて実験を行い，生成されたテキスト手がかりの有用性をクラウドソーシングにより評価した．その結果，周辺環境が似通っている都市部においては，モデルのパフォーマンスが低下する傾向が確認された．

祖父江　智子	M, 2回目発表	ソーシャル・コンピューティング	荒牧　英治,	清川　清,	若宮　翔子,	Peng Shaowen
title: The Impact of Social Media Use on Well-Being abstract: Today, many people use social media, and its influence continues to grow. However, there are divided opinions regarding the impact of social media use on users' well-being. This study aims to clarify the relationship between social media use and well-being by categorizing social media use into three elements: platform, purpose, and behavior. Specifically, we will employ the experience sampling method (ESM) to conduct a two-week survey via crowdsourcing, collecting data on these elements of social media use and users’ well-being after use. Based on the collected data, we will analyze how each element is associated with well-being. Furthermore, we will conduct an intervention experiment to explore the causal relationship between these elements and well-being. language of the presentation: Japanese 発表題目: ソーシャルメディア利用が幸福感に及ぼす影響の調査発表概要: 現代社会では多くの人がソーシャルメディアを利用しており，その影響力は増している．しかし，ソーシャルメディアがユーザの幸福感に与える影響については，意見が分かれている．本研究は，ソーシャルメディア利用を利用プラットフォーム，利用目的，利用行動の３要素に分け，各要素と幸福感の関連を明らかにすることを目的とする．具体的には，経験サンプリング法を用いて，2週間にわたりクラウドソーシングでソーシャルメディア利用の各要素と利用後の幸福感に関するアンケートを実施し，要素別に幸福感との関連を分析する．また，介入実験を通して，その要素と幸福感の因果関係の解明も目指す．

稲積　駿	D, 中間発表	ソーシャル・コンピューティング（ロボット対話知能）	荒牧　英治☆,	吉野　幸一郎,	河野　誠也
title: Towards Contextual Understanding of Real-world Dialogues abstract: Multimodal reference resolution, including phrase grounding, aims to understand the semantic relations between mentions and real-world objects. Phrase grounding between images and their captions is a well-established task. In contrast, for real-world applications, it is essential to integrate textual and multimodal reference resolution to unravel the reference relations within dialogue, especially in handling ambiguities caused by pronouns and ellipses. This work presents a framework that unifies textual and multimodal reference resolution by mapping mention embeddings to object embeddings and selecting mentions or objects based on their similarity. Our experiments show that learning textual reference resolution, such as coreference resolution and predicate-argument structure analysis, positively affects performance in multimodal reference resolution. In particular, our model with coreference resolution performs better in pronoun phrase grounding than representative models for this task, MDETR and GLIP. Our qualitative analysis demonstrates that incorporating textual reference relations strengthens the confidence scores between mentions, including pronouns and predicates, and objects, which can reduce the ambiguities that arise in visually grounded dialogues. language of the presentation: Japanese