コロキアムB発表

日時: 07月22日（Tue） 1限目（9:20-10:50）

会場: L1

司会: Yi Gu

白川　琢磨	M, 2回目発表	脳・行動モデリング	田中　沙織,	清川　清,	CAI LIN,	荻島　大凱
title: Longitudinal Relationship of Adolescent Brain Development, Environmental Factors, and Problem Behaviors abstract: Adolescence is a critical period marked by significant mental and physical changes. As a result, adolescents are more likely to experience problem behaviors. Previous research has highlighted the importance of considering the interactions among brain development, environmental factors such as child-family relationships, and behavior. However, few longitudinal studies have examined how early adolescent brain development and environmental factors affect later problem behaviors. This study addresses this gap by using longitudinal data from the Tokyo Teen Cohort to investigate the causal relationships between environmental factors, brain activity, and the emergence of problem behaviors over time. By analyzing developmental changes and causal relationships, the research aims to identify key factors that influence adolescent problem behaviors and contribute to improved support for adolescents and guidance for parents. language of the presentation: Japanese

成田　大祐	M, 2回目発表	脳・行動モデリング	田中　沙織,	清川　清,	川島　一朔,	荻島　大凱
title: Detecting Mind Wandering With Minimal Equipment abstract: Mind wandering occurs when a person's attention drifts away from the task at hand, leading to reduced focus and productivity. This study investigates the detection of mind wandering using minimal equipment, focusing on brain signals and eye movements. By utilizing compact devices, particularly single-channel in-ear EEG sensors with optimized preprocessing techniques and eye-tracking technology, the research explores the feasibility of accurately identifying mind wandering in everyday scenarios. Results from testing 40 subjects demonstrate varying individual detection accuracy, with preprocessing methods proving crucial for reliable signal quality across subjects. The findings highlight the potential for practical applications of this technology in daily life, while revealing important considerations for personalized calibration approaches. language of the presentation: English

大武　一平	D, 中間発表	光メディアインタフェース	向川　康博,	清川　清,	舩冨　卓哉,	藤村　友貴,	北野　和哉
title: Human Pose Update Using Event-based Camera to Improve Its Accuracy abstract: A human pose estimation that enables real‑time interaction between performers and visual effects is increasingly sought after in entertainment applications. The present study targets such interactivity by sequentially estimating 3D human poses from frames captured by a time‑of‑flight (ToF) camera and synthesizing visual effects (VFX) from the resulting poses. In online operation, however, the processing time required for pose estimation creates a temporal gap between frame acquisition and pose output, reducing both accuracy and real‑time responsiveness. To address this issue, a latency‑compensation scheme is introduced that leverages the low latency and high temporal resolution of an event‑based camera. After each ToF‑based pose estimate is updated to reflect the most recent pose, the result is lifted to a 3D pose on a timestamp basis, thereby preserving real‑time performance. Because updates are triggered after a fixed number of events, the procedure adapts to the magnitude of the performer’s motion. In addition, optical filters separate the operating wavelengths of the VFX, the ToF camera, and the event‑based camera, eliminating mutual interference and enabling robust sensing. These features collectively demonstrate the method's suitability for practical dance‑entertainment scenarios. language of the presentation: Japanese 発表題目: イベントカメラを用いた人物姿勢推定結果の更新によるレイテンシ補償と精度向上発表概要: 姿勢推定技術を用いた，演者とエフェクト間で動的なインタラクションが可能なコンテンツの展開が望まれている．本研究では，ToFカメラのフレームを入力として3次元の姿勢を逐次的に推定し，推論結果を用いてVFXを合成することでインタラクションのあるコンテンツの達成を目指す．しかしオンラインプロセスにおいては，姿勢推定に係る処理時間によりフレームが取得された時刻と姿勢推定結果が出力される時刻の時間差が姿勢推定の精度低下を招き，リアルタイム性の低下につながる．そこでイベントカメラが持つ低レイテンシ性と高時間分解能を活かし，姿勢推定結果が最新の姿勢と一致するよう更新したのち，タイムスタンプベースで3次元へリフトアップすることでリアルタイム性を担保する遅延補償を導入する．また，この遅延補償はイベント数ベースで更新を行うことで人の動きの大きさに基づいて適応的な処理となること，またフィルタを用いて使用する波長をそれぞれ分離することで，エフェクト・ToFカメラ・イベントカメラがそれぞれ干渉せずにセンシング可能であり，実際のダンスエンタテイメントに適用できることを示す．

日時: 07月22日（Tue） 1限目（9:20-10:50）

会場: L2

司会: 平尾悠太朗

蒔苗　茉那	D, 中間発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛,	坂井　優介,	須藤　克仁
titile: Integrating Simultaneous Interpretation Tactics into Simultaneous Speech Translation for Better Quality and Latency Trade-off abstract: Simultaneous speech translation involves generating translations before the speaker has finished speaking, which requires a balance between translation quality and latency. This study explores whether large language models (LLMs) can mimic interpretation strategies commonly used by human interpreters, and whether doing so can improve the trade-off between quality and latency. Experimental results show that LLMs are capable of mimicking tactics, such as segmentation, also known as the salami technique, and omission. The current system and evaluation methods reveal a clear advantage in applying segmentation, while the benefits of omission remain unclear. language of the presentation: English

大竹　啓永	M, 2回目発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛,	坂井　優介
title: BannerBench: Benchmarking Vision Language Models \\for Multi-Ad Selection with Human Preferences abstract: Web banner advertisements, which are placed on websites to guide users to a targeted landing page (LP), are still often selected manually because human preferences are important in selecting which ads to deliver. To automate this process, we propose a new benchmark, BannerBench, to evaluate the human preference-driven banner selection process using vision-language models (VLMs). This benchmark assesses the degree of alignment with human preferences in two tasks: a ranking task and a best-choice task, both using sets of five images derived from a single LP. Our experiments show that VLMs are moderately correlated with human preferences on the ranking task. In the best-choice task, most VLMs perform close to chance level across various prompting strategies. These findings suggest that although VLMs have a basic understanding of human preferences, most of them struggle to pinpoint a single suitable option from many candidates. language of the presentation: Japanese 発表題目: BannerBench: 視覚言語モデルの広告選択能力を測る人間嗜好ベースベンチマーク発表概要: ウェブサイトに掲載され、ユーザーを特定のランディングページ（LP）へ誘導するためのウェブバナー広告は、依然として人間の嗜好が重要視されるため、多くの場合人手により評価される。このプロセスの自動化に向けて、我々はBannerBenchという新しいベンチマークを提案する。これは、視覚言語モデル（VLM）を用いて、人間の嗜好に基づいたバナー選択の能力を評価する。本ベンチマークは、1つのLPから得られた5枚の画像セットを用い、ランキングタスクとベストチョイスタスクの2つの課題で、人間の嗜好との整合性の程度を評価する。実験の結果、ランキングタスクにおいては、VLMは人間の嗜好と中程度の相関を示した一方、ベストチョイスタスクでは、さまざまなプロンプト戦略を用いても、ほとんどのVLMがチャンスレベルに近い性能に留まった。これらの結果は、VLMが人間の嗜好を基本的に理解しているものの、多くの候補から最適な1つを選び出すことには依然として課題があることを示唆している。

尾崎　慎太郎	M, 2回目発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛,	坂井　優介
title: Uncovering the Socio-Economic Bias in Text-to-Image Generation Models abstract: This research proposes an investigation into socio-economic bias in text-to-image generation models, a topic that has received far less attention than racial or gender bias. We examine how current models respond to different prompting strategies when generating images intended to reflect diverse income settings. Our analysis reveals that, regardless of the model or prompt design, the generated images consistently resemble scenes from high-income environments. We observe a strong association between income level and the visual characteristics of the generated outputs. Even when prompts explicitly included information about the country or region and its economic status, the models failed to integrate this context, producing images that ignored the intended socio-economic background. These findings expose a critical weakness in the models' capacity to represent a broad spectrum of global realities. language of the presentation: Japanese 発表題目: 画像生成モデルのもつ社会経済的バイアスの解明発表概要: 本研究では、テキストから画像を生成するモデルにおける社会経済的バイアスの問題に着目する。これまでの研究は、モデルが持つ人種や差別に関するバイアスに関する研究が多い一方で、社会経済的なバイアスに関する研究はほとんど行われていない。我々は、異なる所得層を反映させることを目的としたプロンプトに対して、現在の生成モデルがどのような画像を生成するのか検証する。分析の結果、使用するモデルやプロンプトの粒度にかかわらず、生成される画像はいずれも一貫して高所得層の背景を連想させるものとなった。本研究を通して、モデルが生成する画像と高所得層で撮影された画像は高い類似度を持つことがわかった。さらにプロンプトに国や地域、経済状況に関する情報を明示的に含めた場合でさえ、いくつかのモデルはその文脈を適切に取り入れることができず、意図された社会経済的背景を考慮していない画像を生成した。これらの結果は、生成モデルが世界の多様な現実を表現する能力において、重大な欠陥を抱えていることを明らかにしている。

吉田　大城	M, 2回目発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛,	坂井　優介
title: Visual Priming Effect on Large-scale Vision Language Models abstract: Large-scale Vision-Language Models (LVLMs) integrate linguistic and visual information, demonstrating advanced task-solving capabilities. These models are originally derived from Large Language Models, leading to strong capabilities for language tasks. However, the impact of additional visual information on model responses remains insufficiently understood. In this study, we focus on the priming effect, a psychological phenomenon, to investigate how visual information influences language task processing. We present additional intentionally designed images alongside two types of language tasks with different characteristics and analyze changes in the model's responses. Our experimental results show that model responses shift in the direction intended by the image, suggesting that LVLMs do not simply ignore visual information but actively incorporate it into language processing. Furthermore, the similarity between this behavior and priming effects observed in human cognition suggests that LVLMs may share certain aspects of human cognitive mechanisms. language of the presentation: Japanese 発表題目: 大規模視覚言語モデルにおける視覚プライミング効果に関する研究発表概要: 大規模視覚言語モデル（LVLMs）は言語情報と視覚情報を統合し、高度なタスク解決能力を示す。これらのモデルは元来、大規模言語モデルから派生したものであるため、言語タスクに対して高い能力を持つ。しかし、追加の視覚情報がモデルの応答に与える影響については、まだ十分に解明されていない。本研究では、心理学的な現象である「プライミング効果」に着目し、視覚情報が言語タスクの処理にどのように影響を与えるかを調査する。実験では、特性の異なる2種類の言語タスクと同時に、意図的に設計した画像を追加で提示し、モデルの応答の変化を分析した。実験の結果、モデルの応答が画像の意図する方向に変化することが示され、これはLVLMsが単に視覚情報を無視するのではなく、言語処理に積極的に取り込んでいることを示唆する。さらに、この挙動と人間の認知において観察されるプライミング効果との類似性から、LVLMsが人間の認知メカニズムと何らかの側面を共有している可能性が示唆される。

日時: 07月22日（Tue） 1限目（9:20-10:50）

会場: L3

司会: 本司　澄空

細川　蓮	D, 中間発表	ユビキタスコンピューティングシステム	安本　慶一,	荒牧　英治,	諏訪　博彦
title: Research on Financial Market Prediction Using Heterogeneous Financial Text Data Integration and Reader Stance Indicators abstract: Predicting future financial markets is crucial for supporting investor decision-making and advancing risk management. In recent years, incorporating unstructured text data such as news articles and social media posts into machine learning models has enabled analysis that reflects investor psychology and social conditions that cannot be captured through numerical data alone. However, there are two challenges in utilizing text data for financial market prediction. First, a framework for comprehensively utilizing information from different media sources has not been established. Second, text analysis on stock discussion boards emphasizes the content of posters (writers) and does not sufficiently consider the interpretation and reactions of readers on the information reception side. This research proposes the following approaches to address these challenges: To solve Challenge 1, we propose to (1) integrate heterogeneous text sources—newspaper articles and stock discussion boards as a type of social media—to improve accuracy in market prediction. To solve Challenge 2, focusing on the stance of posters in discussion board posts, we propose to (2) quantitatively analyze the divergence between self-reported stance by posters and distributional stance judgments by 100 readers, clarifying what types of posts cause such divergence. Furthermore, (3) we verify whether prediction accuracy improves by incorporating readers' collective recognition into market prediction models. 発表題目: 異種金融テキストデータ統合と読み手スタンス指標を用いた金融市場予測の研究発表概要: 将来の金融市場を予測することは，投資家の意思決定を支援し，リスク管理を高度化するうえで重要である．近年，ニュース記事やソーシャルメディアの投稿といった非構造テキストデータを機械学習モデルに組み込むことで，数値データのみでは捉えきれない投資家心理や社会的情勢を反映した分析が可能となっている．しかし，金融市場予測におけるテキストデータの活用には，2つの課題が存在している．第一に，異なるメディアソース間の情報を統合的に活用する枠組みが確立されていない．第二に，株式掲示板におけるテキスト分析は投稿者（書き手）の内容を重要視しており，情報受容側である読者（読み手）の解釈・反応を十分に考慮していない．本研究では，これらの課題に対して以下のアプローチを提案する．課題1を解決するために，(1) 新聞記事とソーシャルメディアの一種である株式掲示板という異種テキストソースを統合し，市場予測における精度の向上を図る．また，課題2を解決するために，掲示板投稿における投稿者のスタンスに着目し，(2) 投稿者が自己申告したスタンスと，100人の読者による分布的なスタンス判断との乖離を定量的に分析し，どのような投稿が乖離を生じさせるのかを明らかにする．さらに，(3) 読み手の集合的認識を市場予測モデルに組み込むことで，予測精度が向上するかを検証する． language of the presentation: Japanese

小手川　康太	M, 2回目発表	ユビキタスコンピューティングシステム	安本　慶一,	荒牧　英治,	諏訪　博彦
title:Explanation Generation Method for Financial Market Prediction Models Based on Feature Importance abstract:In financial market prediction, machine learning models that integrate bulletin board data and stock price data have been proposed, but these models face the challenge of having a black-box prediction process. Conventional natural language explanations using LLMs do not achieve fundamental improvements in explainability due to the opacity of the LLM's own reasoning process. This research proposes a method that quantitatively extracts feature importance from prediction models using SHAP (SHapley Additive exPlanations) and incorporates these mathematical foundations into LLM prompts to generate more logical and interpretable explanations. Through experiments, we confirmed that utilizing extracted feature importance information improves the transparency of prediction rationale compared to conventional methods. Currently, we are considering the evaluation of explanation quality under multiple market conditions and the development of dynamic explanation generation that adapts to temporal changes. language of the presentation:Japanese 発表題目: 特徴重要度に基づく金融市場予測モデルの説明生成手法発表概要: 金融市場予測において、掲示板データや株価データを統合した機械学習モデルによる予測手法が提案されているが、その予測プロセスがブラックボックスであるという課題がある。従来のLLMによる自然言語説明では、LLM自体の推論過程が不透明であり、根本的な説明可能性の向上には至らない。本研究では、SHAP（SHapley Additive exPlanations）を用いて予測モデルから特徴重要度を定量的に抽出し、この数理的根拠をLLMのプロンプトに組み込むことで、より論理的で解釈可能な説明生成手法を提案する。実験では、予測に寄与したと考えられる投稿を掲示板の投稿から特徴重要度によって抽出し、これを基にした説明の生成を行った。今後は説明の評価手法を検討しており、SHAPを用いて生成した説明が従来の説明と比較して品質が保たれていることの検証を行う予定である。

松本　一晟	M, 2回目発表	ユビキタスコンピューティングシステム	安本　慶一,	荒牧　英治,	諏訪　博彦,	松井　智一
title: Data Profile Generation Framework for IoT Data Utilization abstract: The rapid growth of data generated by Internet of Things (IoT) devices presents new opportunities for research and innovation. However, this data is often underutilized due to the technical expertise required to interpret and apply it effectively. To address this challenge, we propose an automated framework that leverages large language models (LLMs) to generate natural language dataset profiles. These profiles aim to help non-expert users understand what a dataset contains, what it represents, and how it can be used. The framework operates in four stages: data input, metadata extraction from raw datasets, profile generation using few-shot prompting with LLMs, and evaluation of the generated profiles. Each profile summarizes key aspects such as data types, subject domain, semantic content, and potential use cases. We evaluate our method using real-world IoT datasets and assess the output through both human annotation and automated metrics. Our results show that the generated profiles are coherent, relevant, and align well with human judgments, demonstrating the potential of LLMs to support accessible and scalable dataset understanding. language of the presentation: English