コロキアムB発表

日時: 07月25日 (Fri) 3限目(13:30-15:00)


会場: L1

司会: 佐々木光
LEE SANGMYEONG D, 中間発表 自然言語処理学(ロボット対話知能) 渡辺 太郎☆, 吉野 幸一郎, Angel Garcia Contreras
title: Resolving Linguistic Structural Ambiguity for Vision and Language Models
abstract: Structural ambiguity in natural language arises when a single sentence can be interpreted in multiple ways due to different possible syntactic structures, despite having an identical sequence of words. Resolving such ambiguity not only prevents application-oriented systems, such as task-driven robots, from misunderstanding the user's intent, but also fundamentally enhances the base model's comprehension ability. Since disambiguation requires additional information, visual contex, being a core component of real-world understanding, plays a crucial role. While the rise of large Vision and Language Models (VLMs) is remarkable, relatively little work has addressed their capacity to handle structural ambiguity. This talk introduces previous attempts at disambiguation, current progress in developing an evaluation benchmark, and future plans for resolving this ambiguity more efficiently.
language of the presentation: Japanese
 
大中 緋慧 M, 2回目発表 自然言語処理学(ロボット対話知能) 渡辺 太郎☆, 吉野 幸一郎, Angel Garcia Contreras
title: Joint prediction model of response timing and short response for real-time speech dialogue systems
abstract: Response timing in dialogue is a useful means of expressing a speaker's intent. From this perspective, research on methods to predict response timing has progressed. On the other hand, to effectively utilize such methods, a mechanism to mitigate the latency of generation modules, such as speech synthesis, is necessary. Based on this background, we propose a model that simultaneously predicts response timings and short responses for latency mitigation. The proposed method continuously predicts response timings, and when a response is determined, it selects an appropriate short response through contrastive learning-based ranking. We conducted individual objective evaluations for both tasks and confirmed that our proposed method outperforms comparative approaches in both response timing and short-response selection.
language of the presentation: Japanese
発表題目: リアルタイム音声対話システムのための応答タイミングと短文応答の同時予測モデル
発表概要: 対話における応答タイミングは発話者の意図を表現するための有用な手段である.この観点から応答タイミングを予測する手法の研究が進んでいる.他方で,このような手法を有効に活用するためには,音声合成などの生成モジュールの遅延を緩和する仕組みが必要となる.このような背景に基づき,応答タイミングと遅延緩和のための短文応答を同時予測するモデルを提案する.提案手法は応答タイミングを連続的に予測しながら,応答と判定した場合には対照学習に基づくランキング付けによって適切な短文応答を選ぶ.両タスクに対する単独の客観評価を行い,応答タイミング,短文応答選択の両方で比較手法に対して優れた結果であることを確認した.
 
小松 秀輔 M, 2回目発表 自然言語処理学(ロボット対話知能) 渡辺 太郎☆, 吉野 幸一郎, Angel Garcia Contreras
language of the presentation: Japanese
発表題目: ロボットによる親密行動に対するユーザの許容予測
発表概要: 本研究は状況に応じた適切な行動選択を実現するために、エントレインメントを用いてユーザの認識するロボットとの親密さと親密行動の許容の予測を目指す。 著者らはロボットが親密行動を行うシナリオでの対話実験を実施し、親密行動の実施を判断する上で有用な変数が何かを調査した。対話中のエントレインメント指標と親密行動に関する主観評価との関係を分析した結果、特定のエントレインメント指標が親密行動の予測に寄与することが明らかになった。
 
森 清忠 (午後希望) M, 2回目発表 自然言語処理学(ロボット対話知能) 渡辺 太郎☆, 吉野 幸一郎, Angel Garcia Contreras
title: Dialogue Response Prefetching Based on Semantic Similarity and Prediction Confidence of Language Model
abstract: Response prefetching, in which the system prepares a response in the middle of a user utterance, has been studied as a method for reducing user-perceived latency (UPL), the time the user waits for the system to generate a response, in spoken dialogue systems. Prediction confidence models (PCMs) estimate the probability of a character-level match between a complete user utterance and its predicted sentence, and determine when to prepare a response. However, even if the system does not predict the complete user utterance, it may be able to generate an appropriate response if it can estimate the intent of the utterance in the middle of the user utterance. In this study, we proposed a PCM based on semantic similarity instead of the conventional PCM based on literal agreement, and evaluated its effectiveness.
language of the presentation: Japanese
発表題目: 意味的類似度と言語モデルの予測信頼度評価に基づく対話応答プリフェッチング
発表概要: 音声対話システムにおいて,ユーザがシステム応答の生成中に待機する時間であるユーザ知覚遅延 (User-perceived latency: UPL)を削減する手法として,システムがユーザ発話の途中で応答を事前に準備する,応答のプリフェッチングが研究されてきた.予測信頼度モデル(Prediction confidence model: PCM)は,完全なユーザ発話とその予測文の文字レベルの一致確率を推定し,応答を準備するタイミングを決定する.しかし,システムがユーザ発話を完全に予測せずとも,ユーザ発話の途中で発話の意図を推定できれば,適切な応答を生成できる可能性がある.本研究では,従来の文字的な一致に基づくPCMに代わり,意味的類似度に基づくPCMを提案し,その有効性を評価した.
 

日時: 07月25日 (Fri) 3限目(13:30-15:00)


会場: L2

司会: Yi Gu
神鳥 有沙 M, 2回目発表 生体画像知能 大竹 義人, 松原 崇充, Soufi Mazen
title: Proposal of a Motion Evaluation Model Using Subject-Specific Musculoskeletal Analysis in Basic Classical Ballet Movements
abstract: Classical ballet demands both artistic expression and a high level of physical ability, while also placing significant stress on the body, leading to a high risk of injury for dancers. Many dancers experience musculoskeletal disorders, which can interrupt their careers and interfere with daily life, making injury prevention a critical issue. The main causes of injury are often overuse due to incorrect technique, and currently, instruction largely relies on ballet teachers' guidance, with evaluations typically based on visual observation alone. While some previous studies have applied machine learning to evaluate dance performance, these are often based on subjective criteria defined by experts and lack consideration of internal physical strain. Furthermore, many motion analysis studies in classical ballet use generic musculoskeletal models, which fail to accurately reflect individual anatomical differences. In this study, we propose a motion evaluation model that utilizes simulations based on subject-specific musculoskeletal models and applies machine learning to the resulting data. This approach aims to enable objective and personalized assessments that take into account both movement efficiency and injury risk. In this presentation, we report the results of an accuracy validation of lower limb musculoskeletal models created using statistical shape modeling for individual subjects.
language of the presentation: Japanese
 
衣川 昌孝 M, 2回目発表 光メディアインタフェース 向川 康博, 松原 崇充, 舩冨 卓哉
title: Geometric Transformation Field For Flow Fields Representation Using Gaussian Mixture Model
abstract: In this study, we aim to develop an analysis method that describes the spatiotemporal structure of flow fields in a highly interpretable form, and we propose a technique that models this structure as a geometric transformation field whose transformations vary smoothly over the image. Existing methods represent a flow field as a superposition of spatially extended functions, which makes them illsuited for extracting localized, unsteady phenomena such as the temporal evolution of vortices. To address this, we capture local vortex structures using a Gaussian mixture model (GMM), represent them as a geometric transformation field, and formulate this representation as a parametric model. Using simulation data, we demonstrate that the proposed formulation can describe the spatiotemporal dynamics of vortex structures in a more intuitive manner.
language of the presentation: Japanese
発表題目: 流れ場表現のための混合ガウスモデルに基づいた幾何変換場
発表概要: 本研究では,流れ場の時空間構造を解釈性の高い形で記述する解析手法の開発を目的とし, 画像上で幾何変換が滑らかに変化する幾何変換場によってモデル化する手法を提案する. 既存の解析手法では,流れ場を空間的に広がった関数の重ね合わせで表現するため,渦の時間発展をはじめとする局所的で非定常的な現象の抽出には適さない. そこで本研究では局所的な渦の構造を GMM で捉え,これを幾何変換場を用いたパラメトリックモデルとして定式化する. シミュレーションデータを用いて,提案した定式化によって渦構造の時空間的ダイナミクスをより直感的に記述できることを示す.
 
小橋口 純 M, 2回目発表 光メディアインタフェース 向川 康博, 松原 崇充, 舩冨 卓哉
title: Depth from Focus with Test-time Optimization of Single-image Depth Estimation Models
abstract: Depth from Focus (DFF) is a technique for estimating depth by leveraging changes in a camera's focus distance. While deep learning-based DFF methods can estimate depth with accurate scale, they often suffer from low accuracy due to limited training data. Recently, foundation models such as Depth Anything have been proposed for single-image depth estimation, trained on large-scale datasets. The depth estimated by these foundation models is qualitatively good, but scale-ambiguous. Therefore, this study proposes a method that combines DFF with Depth Anything to achieve depth estimation that has both accurate scale and high qualitative performance. Specifically, DFF outputs are used to optimize Depth Anything’s parameters for each scene at test time. Experiments on real-world datasets demonstrate the effectiveness of the proposed method, showing improvements over conventional DFF methods.
language of the presentation: Japanese
発表題目: 単眼深度推定モデルのテスト時最適化によるDepth from Focus
発表概要: Depth from Focus (DFF)は,カメラのフォーカス距離の変化を利用して深度を推定する技術である. 深層学習を用いたDFFはスケールが正確な深度を推定できるが,学習データが少なく精度が低いという課題がある. 一方で,近年Depth Anything等の,大規模なデータセットで学習された単眼深度推定の基盤モデルが提案されている. これらの基盤モデルで推定された深度は,定性的には優れているがスケールは不定である. したがって,本研究ではDFFとDepth Anythingを組み合わせ,スケールが正確かつ定性的にも優れた深度推定手法を提案する. 具体的には,推論時にDFFの出力を利用し,Depth Anythingのパラメータをシーンごとに最適化する. 実画像のデータセットで評価を行い,提案手法をDFFの従来手法と比較し,有効性を確認した.
 
佐藤 敬介 M, 2回目発表 ヒューマンロボティクス 和田 隆広, 松原 崇充, 劉 海龍
title: Station-keeping control of ROVs with limited field of view in water flow
abstract: Underwater work is widely required for the exploration of marine resources, inspection of underwater structures, aquaculture management, and more. However, it poses significant risks to divers due to high water pressure, tidal flows, and encounters with dangerous marine creatures. To reduce these risks, the use of ROVs (Remotely Operated Vehicles) is becoming increasingly common. However, small and lightweight commercial ROVs are particularly susceptible to the effects of water flow, making stable control difficult. Additionally, challenges such as complex operation and a shortage of skilled operators highlight the need for autonomous and highly accurate control technologies. In our previous research, we proposed a station-keeping control method using the Water Flow Coordinate System inspired by the Wind Coordinate System, and confirmed its effectiveness through real-world testing. However, we also found that simply introducing the water flow coordinate system could cause the AR marker to move out of the camera’s field of view, making it impossible to estimate the ROV’s position. To address this, in the current year, we introduce a control barrier function (CBF) and integrate it with a control method designed to keep the approaching target object (i.e., the AR marker) within the sensor’s field of view. This approach is expected to prevent tracking loss and improve overall control performance. Our final goal is to achieve a level of positional and orientational accuracy that allows the ROV’s gripper to grasp an object under water flow, through experiments conducted with a real ROV in a controlled water tank. Through this research, we aim to improve the efficiency of underwater operations and enhance the practical usability of ROVs.
language of the presentation: Japanese
発表題目: 視野に制限のあるROVの水流下における定点保持制御
発表概要: 水中作業は,海洋資源の探査や構造物の点検,養殖管理など幅広く求められている.しかし,高水圧や潮流,危険生物との遭遇などにより,潜水士にとっては危険が伴う.こうしたリスクを軽減する手段として,ROV(遠隔操作型無人探査機)の活用が進んでいる. しかし,市販の小型・軽量ROVは,特に水流の影響を大きく受けやすく,安定した制御が困難である.それに加え,操縦の難しさや熟練オペレータの不足といった課題もあり,自律的かつ高精度な制御技術の開発が求められている. これまでの研究活動では,風座標系に倣った水流座標系を用いた定点保持制御を提案し,その有効性を実機で確認した.その結果,単純に水流座標系を導入しただけでは,ARマーカーがカメラ視野の外に出てしまい,ROVの位置情報が計算できない問題が確認された. そこで,制御バリア関数(CBF)を導入し,接近対象物体(ARマーカ)をセンサの視野角内に 保持する制御と組み合わせることで,見失いを防ぎ,制御性能の向上を図る. 最終目標は,実機および流水プールでの実験で,水流下においてグリッパを用いて物体把持が可能な位置・姿勢精度を達成することである.本研究を通して水中作業の効率化とROVの実用性向上を目指す.
 

日時: 07月25日 (Fri) 3限目(13:30-15:00)


会場: L3

司会: 遠藤 新
飯田 昌直 M, 2回目発表 コンピューティング・アーキテクチャ 中島 康彦, 林 優一, 張 任遠, KAN Yirong, PHAM HOAI LUAN, Le Vu Trung Duong
title: Adaptive Divide-and-Conquer TSP Solver on Heterogeneous Computing Platform with Pareto-Optimal Cluster Connection
abstract:Solving large-scale Traveling Salesman Problems (TSP) at the 100,000-node level remains a computationally intractable challenge for conventional algorithms. To address this challenge, I propose a heterogeneous architecture on a CPU-FPGA platform. Implemented on an AMD Xilinx Versal ACAP, my framework strategically partitions workloads between the dual-core ARM Cortex-A72 CPU and the Programmable Logic (FPGA). The methodology employs an adaptive divide-and-conquer strategy using recursive K-means clustering. The CPU manages control-intensive operations, such as determining Pareto-optimal inter-cluster connections, while the FPGA fabric is dedicated to accelerating massively parallel, compute-intensive tasks like intra-cluster route optimization.The system was validated on TSP benchmark datasets, scaling up to 85,900 nodes. My architecture achieves a 50–80× speedup over CPU-only implementations and demonstrates 6× greater power efficiency than GPU-based solutions, resulting in a superior Energy-Delay Product (EDP). This approach targets solving 100,000-node instances in under 10 seconds with a power envelope below 50W, while maintaining a solution quality within 3% of the optimal. This approach demonstrates the potential for high-performance, energy-efficient TSP solving, with promising applications in latency-critical and power-constrained fields like edge computing and autonomous systems.
language of the presentation: Japanese
 
衛藤 優 M, 2回目発表 コンピューティング・アーキテクチャ 中島 康彦, 林 優一, 張 任遠, KAN Yirong, PHAM HOAI LUAN, Le Vu Trung Duong
title: Performance Evaluation and Bottleneck Analysis of Large Language Models on a Linear Array CGRA
abstract: Generative AI services, typified by ChatGPT, are now widely used in society and are attracting global attention. On the other hand, the increasing demand for computational resources, including GPUs, has led to a critical challenge of strained power consumption, making it extremely important to balance processing performance with power efficiency. In this study, we evaluated the performance of a Large Language Model (LLM) using IMAX3, a prototype based on the linear array Coarse-Grained Reconfigurable Architecture (CGLA) proposed by our research group. Subsequently, to address the bottlenecks identified in this evaluation, we implemented IMAX4 and evaluated its performance improvements in processing speed and power efficiency in comparison to existing computing platforms such as CPUs. The evaluation results showed that IMAX4 significantly improved end-to-end latency compared to IMAX3. However, a new bottleneck was also identified in IMAX4, where the internal data transfer rate within the FPGA was lower than its theoretical performance and that of IMAX3. Future research will focus on identifying and optimizing this data transfer performance bottleneck, with the goal of establishing a CGLA-based accelerator that can compete with GPGPUs.
language of the presentation: Japanese
 
竹内 歩夢 M, 2回目発表 コンピューティング・アーキテクチャ 中島 康彦, 林 優一, 張 任遠, KAN Yirong, PHAM HOAI LUAN, Le Vu Trung Duong
title: Power Efficiency in Attention Mechanisms Using IMAX
abstract: With the rapid expansion of generative AI, Transformer models have become widely adopted, but their internal Attention mechanism poses a bottleneck due to its quadratic complexity in both computation and memory. FlashAttention has emerged as an efficient algorithm that reduces memory access and enables faster computation while maintaining exact numerical output. However, its implementation has been limited to GPU platforms. In this work, we propose a power-efficient realization of FlashAttention on IMAX3, a linear array-based Coarse-Grained Reconfigurable Array (CGRA) that we refer to as Coarse-Grained Linear Array (CGLA) accelerator. By adapting the block-wise computation and memory layout of FlashAttention to the resource-constrained IMAX3 architecture, we demonstrate that efficient Attention computation is possible in a broader range of hardware environments. Our implementation achieves 4–5 times speedup over naive attention, reduces memory usage, and preserves numerical accuracy. Moreover, Energy-Delay Product (EDP) analysis reveals that FlashAttention on IMAX3 outperforms NVIDIA RTX 4090 for short sequence lengths, highlighting its competitiveness in power efficiency.
language of the presentation: Japanese
 
今村 元洲 M, 2回目発表 サイバーレジリエンス構成学 門林 雄基, 中島 康彦, 妙中 雄三
title: A Pre-execution Detection System for Cryptojacking Attacks on Containers
abstract: Public container registries such as Docker Hub allow anyone to freely share container images. However, attackers have exploited this openness by distributing container images embedded with cryptojacking malware. Once such mining operations are executed, they abuse the host's hardware resources without authorization. Although static analysis of container images has been proposed as a countermeasure, it fails to detect threats when mining programs are installed after the container has already been launched. On the other hand, dynamic analysis enables real-time monitoring of container behavior and can detect ongoing mining activities, but it suffers from the critical limitation that the mining process may have already commenced by the time detection occurs. In this study, we propose a novel detection system that leverages dynamic analysis information prior to container execution, aiming to identify and prevent cryptojacking attacks before they can be launched.
language of the presentation: Japanese