コロキアムB発表

日時: 06月12日（Thu） 3限目（13:30-14:15）

会場: Group A

TRAN VAN DUY	M, 1回目発表	コンピューティング・アーキテクチャ	中島　康彦,	林　優一,	張　任遠,	KAN Yirong,	PHAM HOAI LUAN,	Le Vu Trung Duong
title: QEA: An Accelerator for Quantum Circuit Simulation with Resources Efficiency and Flexibility abstract: The area of quantum circuit simulation has attracted a lot of attention in recent years. However, due to the exponentially increasing computational costs, assessing and validating these models on large datasets poses significant obstacles. Despite plenty of research in quantum simulation, issues such as memory management, system scalability, and execution efficiency remain unresolved. In this study, we introduce QEA, a state vector-based hardware accelerator that overcomes these difficulties with four key improvements: optimized memory allocation management, open PE, flexible ALU, and simplified CX swapper. To evaluate QEA's capabilities, we implemented and evaluated it on the AMD Alveo U280 board, a widely used FPGA platform. We conducted a series of experiments to assess QEA's performance, including memory management efficiency, system scalability, and execution efficiency. Initial results showed that QEA significantly improves memory usage efficiency from 100 to 10000 times compared to the implementation on software in the range from 3 to 17 quantum bits. The next steps will further evaluate QEA's performance in various quantum circuits in order to comprehensively analyze the QEA's scalability and execution efficiency in comparison to previous related studies. language of the presentation: English

DHITAL NEETI	M, 1回目発表	サイバーレジリエンス構成学	門林　雄基,	林　優一,	妙中　雄三
title: Memory Forensics in Ephemeral Linux Virtual Machines on AWS: Prototype Implementation and Future Work abstract: In this research, we investigate the challenges of performing memory forensics in ephemeral Linux virtual machines (VMs) on Amazon Web Services (AWS), particularly focusing on Spot Instances that can be terminated without notice. These instances pose significant difficulties for acquiring volatile memory, which is critical for digital forensics and incident response. We are currently working on memory forensics,so targeting to create a working prototype that automates memory acquisition using AVML, AWS Systems Manager (SSM), and Amazon S3. Through our experiments with ephemeral instances, we identified several limitations—such as incomplete memory captures, delays in execution, and failures due to abrupt shutdowns. These limitations affect the reliability and consistency of forensic evidence collection in cloud-native environments. As part of our ongoing research, we aim to address these challenges by enhancing the timing, automation, and fault tolerance of the acquisition process. Furthermore, we plan to extend this prototype to other major cloud platforms, including Microsoft Azure and Google Cloud Platform, in order to explore the portability and generalizability of our approach. Our goal is to contribute toward a more resilient and automated memory forensics framework that can operate effectively in volatile multi-cloud environments. language of the presentation: English

SIY JOSHUA SAMUEL	M, 1回目発表	サイバネティクス・リアリティ工学	清川　清,	向川　康博,	内山　英昭,	Perusquia Hernandez Monica,	平尾　悠太朗
title: Temporal Control For Motion Diffusion Models abstract: This presentation investigates the critical role of temporal control in text-to-motion generation, emphasizing its necessity for achieving precise and realistic human motion synthesis using Motion Diffusion Models and their advanced iterations, LGTM and SALAD. We begin by outlining the MDM framework (Tevet et al., 2022), which translates textual prompts into 3D motion sequences, but struggles with temporal precision in action frequency and pacing. We then explore LGTM (Zhang et al., 2023), which enhances temporal fidelity through a Disentangling VAE and Text Cross Attention, and SALAD (Xu et al., 2024), which leverages a skeletal-aware latent diffusion network with transformer-based architectures to better capture temporal dynamics. Utilizing the HumanML3D and AMASS datasets, we highlight how temporal control ensures accurate sequencing and repetition of movements, critical for applications like animation and robotics. Survey results demonstrate SALAD’s improved performance in action frequency and speed, underscoring the impact of temporal control. We address challenges in involvement prompting and action frequency, where models often fail to adhere to specified repetitions (e.g., "Person punched 100 times"), proposing solutions like Text to Temporal Quantitative Suggestion and Cross-Attention blocks. This work highlights why temporal control is essential for expressive and accurate motion generation, paving the way for future advancements in text-to-motion systems. language of the presentation: ENGLISH

XIAO JINGNAN	M, 1回目発表	ソーシャル・コンピューティング	荒牧　英治,	渡辺　太郎,	若宮　翔子,	PENG SHAOWEN
title: Structured Knowledge Generation and Validation via LLMs: Exploring the Possibility of Replacing Knowledge Graphs for Recommendation abstract: Knowledge Graphs (KGs) are widely used in recommender systems to provide structured semantic information, but they often suffer from issues such as incomplete coverage, semantic fragmentation, and limited structure propagation. Recently, Large Language Models (LLMs) have demonstrated strong capabilities in natural language understanding and structured knowledge generation, offering potential to address the limitations of traditional KGs. However, the stability, structural validity, and semantic reliability of LLM-generated knowledge remain unverified, making it necessary to explore their feasibility as a substitute or complement to traditional KGs. This study first evaluates the consistency of LLMs in reproducing existing KG triples through multi-turn prompting. Next, it assesses the plausibility and trustworthiness of triples generated from natural language inputs using mechanisms such as reverse verification and overlap analysis. Currently, the research has entered the third stage, where recommendation graphs based on LLM, KG, and their combination (LLM+KG) are constructed. A unified graph neural network model is applied to compare their recommendation performance and examine whether LLM-generated structural knowledge can effectively replace or enhance traditional KGs. The findings are expected to offer experimental support and methodological insights for the design of dynamic knowledge structures and KG completion in recommender systems. language of the presentation: English

SUWANACHOTE NABHAN	M, 1回目発表	ソフトウェア設計学	飯田　元,	松本　健一,	柏　祐太郎,	Reid Brittany
title: Evaluating the impact of Domain Adaptation for Code Completion Models abstract: Code completion tools are widely used in modern software development environments, with deep learning (DL)-based models offering significant improvements over traditional approaches through their ability to provide context-aware and multi-line suggestions. Recent work has shown that domain adaptation—fine-tuning language models on project-specific or domain-specific code—can further enhance completion quality. However, the specific benefits and behaviors of such adapted models remain underexplored. In this study, we evaluate the impact of domain adaptation on multiple large language models (LLMs) for code completion across several software projects. We examine how domain adaptation affects model performance at the project level. Our findings offer empirical insights into the effectiveness of domain-adapted code completion models and contribute to a deeper understanding of their applicability and limitations. language of the presentation: English

山本　啓太	M, 1回目発表	ヒューマンロボティクス	和田　隆広,	松原　崇充,	劉　海龍,	織田　泰彰,	本司　澄空
title: Recommending Level of Haptic Authority Based on Sensor Reliability Using a Grip Mechanism abstract: Haptic shared control (HSC) is a branch of shared control in which both a human operator and an autonomous controller exert force on a shared, physical control terminal of a robot. HSC fluidly combines human intelligence and machine precision, improving task performance and reducing the human workload. However, when the autonomous controller is unreliable, the frequent requirement for human intervention adversely affects the task performance and the operator workload. Previous research showed that allowing the human operator to adjust the strength of the force exerted by the autonomous controller can alleviate the problem. However, the decision of when and how much to adjust the haptic strength remained a challenging task for a human operator. This study proposed an approach that utilized a grip mechanism for adjusting and recommending an appropriate strength. In the proposed approach, the human operator was responsible for deciding the haptic strength via the grip angle. The autonomous controller suggested the appropriate haptic strength based on its reliability by applying proportional control to the grip angle. An experiment with a simulated remotely operated vehicle (ROV) teleoperation with HSC guidance showed that the proposed method is effective in aiding the adjustment of autonomous controller input strength, resulting in a marginal reduction in workload. language of the presentation: English

武林　祥貴	M, 1回目発表	ロボットラーニング	松原　崇充,	和田　隆広,	柴田　一騎,	鶴峯　義久,	佐々木　光
title: Imitation Learning for a Legged Robot through a Demonstration Interface abstract: Autonomously controlling legged robots’ locomotion is a fundamental challenge in robotics, with applications ranging from search and rescue to delivery and home assistance. Reliable autonomous locomotion could make these robots useful in daily life, enhancing convenience, safety, and accessibility. Reinforcement learning (RL) has shown effective results in training locomotion policies, but it often requires carefully tuned reward functions and large amounts of training data, which are difficult to design and collect, especially for complex or task-specific behaviors. One promising alternative is to use demonstrations provided by humans through interfaces. Human demonstrations offer rich, task-relevant behavior that can guide robot learning without the need for manual reward design. However, due to the significant differences in body structure and movement between humans and quadruped robots, directly using this data is often infeasible. This research explores how a quadruped robot with a front-mounted gripper can learn object-picking tasks from human demonstrations. We use Generative Adversarial Imitation Learning (GAIL), which combines GANs with reinforcement learning and trains a policy to imitate expert actions using a discriminator. While GAIL works well with robot-generated data, its feasibility with human demonstrations is uncertain due to differences in embodiment and the discriminator’s inability to judge whether expert actions are physically feasible for the robot. Our goal is to reduce reliance on robot-specific data and enable more flexible, human-guided learning. language of the presentation: English

LI ZIQI	M, 1回目発表	光メディアインタフェース	向川　康博☆,	川西　康友,	薗頭　元春,	舩冨　卓哉,	藤村　友貴,	北野　和哉
title: Object-aware RGB-to-Thermal Image Translation for Kitchen Food Ingredients abstract: Accurately estimating the temperature of food ingredients during cooking is essential for ensuring safety, flavor, and optimal timing in smart kitchen systems. To achieve this without expensive thermal cameras, we explore an RGB-to-Thermal image translation approach. However, existing methods such as UNIT-DDPM often struggle to preserve fine-grained object-level information, which is critical for accurately generating thermal patterns of individual ingredients. We propose Seg-UNIT-DDPM, a framework that incorporates segmentation masks from the Segment Anything Model (SAM) into UNIT-DDPM, allowing the model to focus on each food item separately. This object-aware guidance helps generate more precise and physically plausible thermal images suitable for downstream temperature analysis. language of the presentation: English

LIU HANZE	M, 1回目発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛,	坂井　優介
title: Lyrics translation with syllable-melody alignment abstract: This work explores an approach of lyric translation that preserves both meaning and musicality by aligning translated syllables with original melodies. Traditional translation methods often neglect prosodic structure related to melody, resulting in unsingable outputs. This work proposes a syllable-melody alignment framework that integrates phonetic duration modeling and melody-constrained decoding, in order to generate rhythmically and semantically coherent and singable translations. language of the presentation: English

YU SHUYI	M, 1回目発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛,	坂井　優介
title: An Investigation of Jailbreak via Text Perturbation abstract: Large language models (LLMs) are powerful but remain vulnerable to jailbreak attacks that bypass built-in safety mechanisms. In this work, we investigate how minor modifications to input prompts—referred to as text perturbations or unnormalized text—can effectively trigger such attacks. We evaluate the effectiveness of these perturbations across models of varying sizes (from 1B to 7B parameters), aiming to uncover the relationship between model scale and sensitivity to unnormalized inputs. Furthermore, we analyze the internal tendency of the models using attention transformation patterns and tools such as Logit-Lens to better understand the mechanisms underlying successful jailbreaks. We also discuss the characteristics of effective perturbations and consider potential enhancement of defenses for future LLMs. language of the presentation: English

児玉　創太郎	M, 1回目発表	脳・行動モデリング	田中　沙織,	池田和司,	久保　孝富,	荻島　大凱
title: Bayesian modeling of abnormal attentional bias in anxiety disorder abstract: Humans employ the capacity of attention—the selective processing of information—to make effective use of their limited information-processing resources. Although attention is a fundamental and important cognitive function, its underlying mechanisms remain largely unknown due to the diversity of its components and processes. In recent years, research adopting the Bayesian-inference framework to understand cognition and behavior has grown, and various features of attention are increasingly being captured mathematically. One symptom linked to attention is anxiety. Anxiety disorders, which have the highest prevalence among psychiatric conditions, still involve many unresolved questions in their pathophysiology, but the involvement of abnormal attentional biases has been suggested. In this study, we aim to elucidate the mechanisms of anxiety disorders by designing Bayesian models and conducting behavioral experiments on tasks in which patients with anxiety disorders exhibit abnormal attentional biases. language of the presentation: Japanese 発表題目: 不安障害における異常な注意バイアスのベイズモデリング人間は限られた情報処理能力を有効に用いるために、処理する情報の選択、すなわち注意能力を用いる。注意は基本的かつ重要な認知機能である一方、その要素やプロセスの多様性により原理的なメカニズムは未だ不明な点が多い。近年、ベイズ推論の枠組みで認知や行動を理解する研究が増加しており、注意の多様な特徴も数理的に理解されつつある。注意が関連する症状の一つに不安がある。不安障害は最多の罹患者を有する精神疾患であり、病態生理には未解明な点が多いが、異常な注意バイアスの関与が示唆されている。本研究では、不安障害患者が異常な注意バイアスを示す実験課題について、ベイズモデル設計と実際の行動実験を行うことで、不安障害のメカニズムの解明を目指す。

日時: 06月12日（Thu） 3限目（14:15-15:00）

会場: Group B

ZHONG JIAJUN	M, 1回目発表	コンピューティング・アーキテクチャ	中島　康彦,	林　優一,	張　任遠,	KAN Yirong,	PHAM HOAI LUAN,	Le Vu Trung Duong
title: Hardware-Software Co-Design: An Accelerator for Transformer SNN Image Classifier with Multimodal Inputs abstract: Spiking Neural Networks (SNNs), regarded as the third generation of Artificial Neural Networks (ANNs), attract numerous researchers to adopt the structure because the information within it is transmitted in the form of spikes, facilitating devices run with less energy consumption. Transformer SNNs (TSNNs) for image classfication combine the merits of Transformer, which is proficient in locating the relationship between image patches, and SNNs. They have shown promising performance when compared to the conventional Convolutional networks. Multimodal information, like audios and images, better enable the networks to understand the features of specific classes. In multimodal networks, a crucial operation is to effectively leverage the extracted information of multimodals from the attention mechanism. However, the existing fusion methods of the information are not efficient enough, limiting the performance in real-world multimodal scenarios. Besides, the recent researches on TSNNs accelerators mainly focus on the uni-modal inputs. In this research, we will initially find and adopt a strategy that makes full use of the extracted features from Spiking Cross Attention to increase the performance of the multimodal network. Oppoturnities offered by the algorithmic optimizations are then harnessed in domain-specific hardware solutions. Extensive experiments on benchmarks datasets CIFAR10-AV, UrbanSound8K-AV and MNISTDVS-NTIDIGITS will be conducted. language of the presentation: English

YAMEOGO KISWENDSIDA ARISTIDE CHILDERIC	M, 1回目発表	サイバーレジリエンス構成学	門林　雄基,	安本　慶一,	妙中　雄三
title: Enhancing Wearable Device Security : Comparative Analysis, Penetration Testing Methodology, and a Novel Security Framework abstract: A comprehensive study on the security of wearable devices across multiple categories窶琶ncluding smartwatches, medical devices, smart glasses, and VR headsets. We perform a comparative security analysis by evaluating two devices per category窶俳ne high-end and one low-cost窶背ith a focus on their wireless connectivity (BLE, Wi窶詮i, NFC, UWB, ANT+), firmware robustness, and data protection mechanisms. In addition to the comparative analysis, we propose a novel penetration testing methodology specifically designed for wearable devices. Our approach integrates both passive and active testing techniques in a controlled lab environment to systematically evaluate vulnerabilities across key attack surfaces such as wireless communications, firmware integrity, and authentication processes. The methodology is supported by a detailed checklist workflow and metrics for success based on industry standards (ex. CVSS). Building on our experimental findings, we also introduce a security framework aimed at enhancing the protection of sensitive health data and personal information in wearables. This framework provides practical recommendations for improving security in low-cost devices without significantly increasing production costs, and serves as a guide for manufacturers and researchers to address common security gaps. Overall, this research contributes to a better understanding of the current security posture of wearable devices and offers actionable insights for future improvements in both methodology and design. language of the presentation: English

ACO ELYANAH MARIE CARIAGA	M, 1回目発表	ソーシャル・コンピューティング	荒牧　英治,	Sakriani Sakti,	若宮　翔子,	PENG SHAOWEN
title: Designing Belonging: A Multimodal Study of Community-Exclusive Graphicons Across Digital Platforms abstract: Graphicons, or visual devices used in text-based computer-mediated communication (CMC), have evolved in form and function in parallel with digital communication. The recent emergence of what we term community-exclusive graphicons (CGs) accessible only to members of a particular online community possibly highlights the role of these symbols as cultural capital more explicitly than public ones, although to what extent this is true in actual CG systems is unknown. We examine how online communities in different digital platforms curate their CG systems, testing the feasibility of multimodal models to uncover design patterns at scale. Preliminary findings on Twitch.tv channels demonstrate the capabilities of OpenAI's GPT-4o and CLIP for capturing semantic and visual information, and suggest that communities design CGs to align with more universally recognizable visual language despite their exclusivity. Next steps include assessing the robustness of results and expanding the study to include other community-centric platforms such as Discord. language of the presentation: English

GAIDUCHENKO SOFIA	M, 1回目発表	ソーシャル・コンピューティング	荒牧　英治,	松本　健一,	若宮　翔子,	PENG SHAOWEN
title: Large Language Model Challenges to Detect Cancer-Related Cognitive Impairment from Patient Short Speech abstract: Recent studies have reported that the cognitive function of cancer patients often declines, known as Cancer-Related Cognitive Impairment (CRCI). Since patients with cancer need to make high-stakes decisions during treatment by themselves, it is important to measure the CRCI. To do so, this study uses language-based CRCI screening to examine the language ability of the participants. This study was conducted to determine whether a natural language processing based system can detect CRCI or not. We obtained speech samples from participants including patients with cancer and cognitive impairment scores (n = 224; 86 men and 138 women; average age = 52.9 y/o). Using LLMs, we extracted 8 linguistic measurement metrics from the collected data. We divided patients into high-cognitive (n = 208) and low-cognitive (n = 12) groups. There was not statistically significant difference between the two groups. The results did not show any correlation between CRCI and language features derived from participants' speech using LLM. Still, few language metrics showed a slight feasibility in CRCI screening. language of the presentation: English

SETTEWONG TASHA	M, 1回目発表	ソフトウェア工学	松本　健一,	安本　慶一,	嶋利　一真,	Fan Youmei
title: Characterizing Top Ranking Data Science Jupyter Notebooks in Kaggle Competitions abstract: This study analyzes 7,987 Kaggle notebooks to identify factors influencing the tier of user, focusing on notebook, code, and description attributes using machine learning and statistical methods. The demand for data science skills is high and the study can help aspiring data scientists and machine learning practitioners improve their performance by understanding the factors that contribute to notebook popularity and quality. language of the presentation: English

RAHMAN MIZANUR	M, 1回目発表	ディペンダブルシステム学	井上　美智子,	中島　康彦,	WANG Wenyuan,	江口　僚太,	笹田　大翔
Title: Fault-Aware Quantization for Reliable QNNs on Memristor Crossbars Abstract: Memristor crossbars offer a promising platform for deploying Neural Networks (NNs) due to their high integration density and energy-efficient parallel computation. Quantization is essential as memristor devices offer only a limited set of stable conductance levels, restricting the precision available for weight representation. However, device-level imperfections, particularly stuck-at faults (SA0/SA1), affecting more than 10% of memristors pose significant challenges to inference reliability. This research aims to investigate the impact of such faults on QNN performance and explore fault-aware quantization and mapping strategies to improve robustness. By modeling realistic fault behavior and evaluating both Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), we seek to identify efficient techniques that mitigate accuracy degradation without requiring retraining or hardware redundancy.

YAO KOUAME JEAN FLORENTIN	M, 1回目発表	ユビキタスコンピューティングシステム	安本　慶一,	Sakriani Sakti,	諏訪　博彦,	佐々木　航,	松井　智一
title: Gloss-free Sign Language Translation with Hamburg Notation System abstract: Automatic sign language recognition is essential for promoting the inclusion of deaf and hard-of-hearing individuals in everyday communication. In this context, a gloss is a textual annotation that represents the meaning of a sign; it bridges the visual gesture and written text, but its manual annotation remains costly and specific to each corpus. To overcome this obstacle, gloss-free recognition relies on alternative representations. Existing methods often use alternative intermediate labels optimized for a specific sign language or domain, which limits their generalization and makes retraining on a new corpus expensive. In this study, we propose employing the Hamburg Notation System (HamNoSys)—an interlingual system capable of describing virtually all sign languages—as an intermediate label. A Vision Transformer is trained to predict, from each video, the corresponding HamNoSys sequence. The extracted visual representations are then fed to an LLM to generate the video-to-text translation. We will evaluate generalization on multiple corpora (e.g., RWTH-PHOENIX and CSL) by comparing BLEU and ROUGE scores against systems that use other intermediate representations. This approach facilitates extension to additional sign languages without requiring costly new annotations or retraining of the feature extractor. language of the presentation: English

LI YAN	M, 1回目発表	計算システムズ生物学	金谷　重彦,	松本　健一,	小野　直亮,	MD.Altaf-Ul-Amin
title: Development of Deep Learning Models for Automated Stone and Urinary Tract Segmentation in CT Images and Feature Extraction for Recurrnce Risk Prediction abstract: Kidney stone disease is a common and recurrent urological condition that poses significant diagnostic and treatment challenges due to its anatomical complexity and high relapse rate. This study proposes a deep learning-based framework for the automated analysis of abdominal CT images to enhance the clinical management of kidney stones. The system consists of two core modules: (1) a stone segmentation model based on a VGG11-U-Net architecture to detect and outline kidney stones, and (2) a urinary tract segmentation model to reconstruct the 3D anatomy of the kidneys, ureters, and bladder. From these segmentation results, a set of quantitative features is extracted, including stone count, volume, shape complexity, and anatomical abnormalities such as dilation or tortuosity. These imaging-derived features are then combined with clinical data—such as patient age, gender, recurrence history, and surgical background—to train a recurrence risk prediction model. The predictive model, using algorithms like XGBoost or multilayer perceptron, outputs both a recurrence probability and an estimated recurrence interval. Additionally, explainability techniques such as SHAP are employed to highlight key contributing factors in each prediction. This integrative system aims not only to assist in rapid and consistent diagnosis but also to provide individualized risk assessment for recurrence, thereby enabling more targeted follow-up and preventive strategies. The proposed workflow offers a novel approach to combining medical imaging and patient history for intelligent decision support in urolithiasis management. language of the presentation: English

長谷川　遼	M, 1回目発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛,	坂井　優介
title: Knowledge Editing Induces Underconfidence in Language Models abstract: As language models continue to scale, the demand for knowledge editing, a retraining-free knowledge update method, has increased. However, since knowledge editing directly alters token prediction probabilities acquired during pretraining, the probabilities may diverge from the empirical distribution. In this study, we analyze the impact of knowledge editing to compare the alignment between token prediction probabilities and task accuracy by calculating confidence calibration before and after knowledge editing. Our results reveal that, for tasks requiring semantic understanding, the range of increase in token prediction probabilities tends to be smaller than that of accuracy improvement, suggesting that knowledge editing methods lead to less confidence in prediction. language of the presentation: Japanese 発表題目: 知識編集はモデルの自信不足を引き起こす発表概要: 言語モデルの大規模化に伴い、再学習なしで知識更新が可能な知識編集の重要度は増している。しかしながら知識編集では、事前学習で得たトークン予測確率を事後学習で編集するため、実際の精度とトークン予測確率すなわちモデルの確信度が乖離する可能性がある。本研究では、知識編集前後のconfidence calibrationを計算することで、トークン予測確率と実際の精度の一致度を分析した。その結果、知識編集後は精度の上昇幅ほどはトークン予測確率は向上しない、すなわち知識編集はモデルに自信不足の影響を与えることが判明した。

RIZA SETIAWAN SOETEDJO	M, 1回目発表	自然言語処理学	渡辺　太郎,	Sakriani Sakti,	上垣外　英剛,	坂井　優介
title: Factuality-based Minimum Bayes Risk (MBR) Decoding for Faithful Summarization abstract: Summarization tasks, which aim to generate a concise summary from a source document, are challenged by the existence of multiple valid summaries. While current methods effectively create numerous candidate summaries, a significant gap remains: summarization model struggles to select the optimal summary by simultaneously considering the source document's content and the relationships among other generated candidates. This limitation often leads to partially correct summaries. To address this, we propose Factuality-based Minimum Bayes Risk (FMBR) decoding. FMBR is a novel reranking method designed to identify the most representative summary from a pool of candidates. Unlike existing approaches, FMBR explicitly evaluates not only the candidate's consistency with the source document but also its consensus among other candidate summaries. This dual consideration aims to produce summaries that are both accurate and reflect a stronger overall agreement, thereby improving the quality and completeness of automatically generated content. language of the presentation: English

RATURI HIMANSHU	M, 1回目発表	数理情報学	池田　和司,	安本　慶一,	久保　孝富,	日永田　智絵,	LI YUZHE
Title: Smart Surveillance for Public Safety: Real-Time Tracking of Vulnerable Individuals Abstract: In highly crowded public environments, such as festivals, train stations, and disaster zones, tracking missing or vulnerable individuals in real time is a significant challenge. Occlusions, dense populations, and visual similarity among individuals often lead to tracking errors or missed detections. This research proposes a robust, real-time deep learning framework that combines YOLOv8 for fast and accurate person detection, MMpose for fine-grained pose estimation, and Vision Transformers utilizing Multi-Head Self-Attention (MHSA) to model spatial and temporal relationships for accurate movement prediction and identity tracking. The pipeline begins with YOLOv8 detecting individuals in the scene. Detected bounding boxes are processed by MMpose to extract 2D skeletal keypoints, capturing human posture and motion cues. These keypoints, alongside spatial image features, are encoded and passed into a Transformer architecture. Using MHSA, the model learns to focus on relevant pose patterns and temporal dynamics, enabling the tracking system to maintain consistent identities and anticipate trajectories—even when individuals are briefly occluded or partially visible. The framework is designed for real-time deployment, offering low-latency performance suitable for live surveillance or emergency response. It is evaluated on datasets such as CrowdHuman, PoseTrack, and custom footage from crowded environments. Metrics such as MOTA, ID switches, and inference speed are used to assess performance. Beyond technical contributions, the research also addresses the ethical challenges of surveillance technologies, emphasizing the importance of privacy, fairness, and responsible use. This work contributes a scalable and accurate solution for real-time human tracking in dense crowds by fusing pose-based motion understanding with the long-range attention capabilities of modern Vision Transformers. Language of the presentation: English

日時: 06月12日 （Thu） 3限目（13:30-14:15）

会場: Group A

日時: 06月12日 （Thu） 3限目（14:15-15:00）

会場: Group B

日時: 06月12日（Thu） 3限目（13:30-14:15）

日時: 06月12日（Thu） 3限目（14:15-15:00）