コロキアムB発表

日時: 12月10日 (火) 3限目(13:30-15:00)


会場: L1

司会: Isidro Butaslac
平野 雄太 M, 2回目発表 ヒューマンAIインタラクション Sakriani Sakti 渡辺 太郎 大内 啓樹 Faisal Mehmood
title: Diarize to Transcribe: Advancing Multi-Talker ASR with Decoder Self-Conditioning Diarization
abstract: The current mainstream approach for multi-talker speech recognition (ASR) tasks, such as meeting transcription, is a pipeline system that concatenates independently trained modules for source separation, ASR, and speaker diarization. However, such complex multi-module systems are difficult to optimize globally and require significant development effort. In contrast, end-to-end (E2E) multi-talker ASR using a single neural network module avoids these drawbacks and is expected to replace pipeline systems. This study proposes an E2E multi-talker ASR model that incorporates auxiliary information indicating "who spoke when" to improve the recognition accuracy of overlapping speech. Based on the analysis of the baseline model, the proposed method utilizes the estimated auxiliary information in the decoder of the model.
language of the presentation: English
 
WANG YINGJIE M, 2回目発表 ヒューマンAIインタラクション Sakriani Sakti 渡辺 太郎 大内 啓樹 Faisal Mehmood
title: Neural Vocoder with Frequency Domain Flow Matching for High-Fidelity Speech Synthesis
abstract: We introduce a neural vocoder that combines the advantages of frequency domain modeling with Optimized Transportation Conditional Flow Matching (OT-CFM), addressing the trade-off between the fast inference of GAN-based iSTFT methods and the high quality but slow generation of diffusion models. Experimental results on LibriTTS and JSUT datasets show that our approach achieves competitive UTMOS scores at 40 times faster than real-time on CPU while demonstrating robust cross-lingual generalization.
language of the presentation: English
 
ZHOU WANGZIXI M, 1回目発表 ヒューマンAIインタラクション Sakriani Sakti 渡辺 太郎 大内 啓樹 Faisal Mehmood
title: *** Collaborative Human-Machine Feedback for Text-to-Speech Synthesis ***
abstract: *** Current text-to-speech (TTS) models have made significant progress in generating high-quality speech; however, emotional TTS remains a challenging task due to the limitations of synthesizing only a few predefined emotion types. Emotions are inherently diverse, and existing emotional speech datasets fail to capture the full spectrum of human emotions, limiting the ability of TTS models to represent emotions comprehensively. To address this challenge, this research proposes a TTS system that leverages human feedback via a discriminator to better capture the distribution of human emotions. For enhanced efficiency, we also aim to integrate machine feedback through a speech emotion recognition (SER) model, adopting a collaborative human-machine feedback approach instead of relying solely on human feedback. Our current experiments focus on training the generator to develop a model capable of utilizing continuous emotion vectors for generating emotional speech. ***
language of the presentation: *** English ***
 

会場: L2

司会: 佐々木航
灘井 美樹 M, 2回目発表 大規模システム管理 笠原 正治 池田 和司 井上 美智子 原 崇徳
title: Mathematical Model for Software Update Strategies in Large-Scale Server Systems
abstract: We model and analyze a large-scale outage caused by software updates from CrowdStrike and the subsequent recovery process using a stochastic-process model. The system consists of multiple servers, and information services are provided as long as the number of operational servers exceeds a specified threshold. Software updates are applied simultaneously to all servers at regular intervals; however, some servers may encounter failures due to issues arising during the updates. If the number of operational servers falls below the threshold due to these failures, the provision of information services is interrupted. Failed servers are repaired one by one, and service provision resumes when the number of operational servers reaches the threshold again. In this study, we model the number of operational servers as a discrete-time Markov chain, analyzing performance measures such as system availability and the average repair completion time. Numerical examples show how the performance measures are affected by the update interval and average repair time.
language of the presentation: Japanese
 
尾上 圭介 M, 1回目発表 数理情報学 池田 和司 笠原 正治 久保 孝富 日永田 智絵 Li Yuzhe
title: Leveraging Student-t Processes to Enhance Performance Stability in Bayesian Optimization
abstract: While Gaussian Processes (GPs) are widely used as surrogate models in Bayesian Optimization (BO), Student-t Processes (TPs) offer a compelling alternative by excelling in handling heavy-tailed data and robustly managing uncertainty in the presence of anomalies. However, existing TP-based BO methods often fix the degree of freedom (ν) parameter or use a generic prior, limiting their potential or leaving it underexplored for flexible uncertainty quantification. To address this, we propose MI-TPBO (TPBO Based on iterative update of ν for maximum information-gain), a novel algorithm that dynamically adjusts degree of freedom ν during each trial, reducing reliance on likelihood-based criteria. Furthermore, we aim to investigate the theoretical implications of surrogate model selection on BO performance, with a focus on regret analysis, to provide deeper insights into optimization under uncertainty.
language of the presentation: English
 
齋藤 正博 M, 1回目発表 数理情報学 池田 和司 笠原 正治 久保 孝富 日永田 智絵 Li Yuzhe
title: Test Time Adaptation via Funnel Activation Adjustment
abstract: Deep learning models have demonstrated remarkable performance across various tasks, including image recognition. However, when deployed in real-world scenarios, these models often suffer from performance degradation due to distribution shifts between the training data (source domain) and test data (target domain). Test-Time Adaptation (TTA) has emerged as a promising approach to address this issue by adapting pre-trained models during inference using unlabeled test data, without requiring access to the source domain data. Among widely recognized TTA methods, Test Entropy Minimization (Tent) improves model performance under distribution shifts by updating channel-wise affine parameters in normalization layers to minimize the entropy of output predictions. Despite its effectiveness, Tent's reliance on channel-wise affine parameter updates limits the model's flexibility in adapting to more complex distribution shifts. In this study, we propose integrating Funnel Activation (FReLU), which includes convolutional processing to precisely control receptive fields and better capture features, into Tent. By adjusting receptive fields during TTA, our approach aims to enhance adaptability and flexibility, with the goal of achieving improved performance under complex distribution shifts.
language of the presentation: Japanese
発表題目: 畳み込みを含む活性化関数を用いたテスト時適応
発表概要: 深層学習モデルは、画像認識を含むさまざまなタスクで優れたパフォーマンスを示している。しかし、これらのモデルを現実世界のシナリオに展開する際、訓練データ(ソースドメイン)とテストデータ(ターゲットドメイン)の間の分布シフトにより、性能の低下が生じる課題がある。テスト時適応(Test-Time Adaptation, TTA)は、この問題に対処する有望なアプローチとして登場した。TTAは、ラベルなしのテストデータを用いて推論中に事前学習済みモデルを調整し、ソースドメインのデータにアクセスせずに性能を改善する。そうしたTTAの中で広く認識されている手法の一つがテストエントロピー最小化(Tent)である。Tentは、正規化層のチャネルごとのアフィンパラメータを更新し、出力予測のエントロピーを最小化することで、分布シフト下でのモデル性能を改善する。しかし、その有効性にもかかわらず、Tentはチャネルごとのアフィンパラメータの更新に限定されており、より複雑な分布シフトへの適応におけるモデルの柔軟性を制限している。そこで本研究では、畳み込み処理を含むことで、受容野を精密に制御し特徴をよりよく捉えるようにした活性化関数Funnel Activation (FReLU) をTentに組み込み、TTA中に受容野を調整することで、適応性と柔軟性を向上させ、複雑な分布シフトに対する性能向上を目指した。
 
陳 俊豪 M, 1回目発表 数理情報学 池田 和司 笠原 正治 久保 孝富 日永田 智絵 Li Yuzhe
title: Adapting Visual-language Models for Medical Images Anomaly Detection in Pancreatic Cancer
abstract: Foundation models are models that are trained on large amounts of broad data and are adaptable to a wide range of downstream tasks. They form the basis of all state-of-the-art systems across a wide range of tasks and have shown impressive generative and few-shot learning abilities. However, it is difficult for engineers in medicine to build foundation models, since there are few anomaly medical image data. In this work, we propose the method that apply the foundation model to medical domain to achieve the good performance without the expensive data.
language of the presentation: English
 

会場: L3

司会: Ahmad Kamal Nasution
石井 大智 M, 1回目発表 ネットワークシステム学 岡田 実 林 優一 東野 武史
title: A Proposal of LPWA Signal Relay for MIMO Transmission over Analog RoF Network
abstract: The Low power wide area (LPWA) technology is employed as wireless internet access from remote Internet of Things (IoT) devices. The LPWA is superior to other radio interfaces in terms of both power consumption and radio coverage, and the multiple-input and multiple-output (MIMO) transmission is widely used in wideband and reliable radio connections. This article considers a radio relay configuration for the MIMO transmission to decrease the radio dead zone where it arises in a mountainous area. This article also discusses a compensation method for channel capacity degradation in MIMO transmission when it passes through a coupled channel of wireless-wired-wireless.
language of the presentation: Japanese
発表題目: IoT デバイスの無線アクセスのための光ファイバ無線による LPWA MIMO 中継伝送
発表概要: Internet of Things (IoT) デバイスの無線アクセスに用いられる Low Power Wide Area (LPWA) は、低消費電 力で広域通信を特徴とし、最近では、高速・高信頼に対応した Multiple-Input Multiple-Output (MIMO) 伝送が可能な規 格が使用されている。本稿では、山間部をはじめとする電波不感地帯を解消するため、光ファイバ無線を用いた中継 伝送を検討する。複数の無線信号で構成される 1 組の MIMO 伝送において、有無線が結合した伝送路を通過した際に 通信路容量が減少することがあるため、この対策方法を検討する。
 
原田 知幸 M, 1回目発表 ネットワークシステム学 岡田 実 林 優一 東野 武史
title: Indoor location estimation using IR-UWB
abstract: Recently, an indoor location estimation system using ultra-wideband (UWB) wireless communication has been considered. This system has multiple anchors and measures the distance and distance difference between the anchors and the tag to be located. Time-of-flight (ToF) and time-difference-of-arrival (TDoA) are the main positioning methods, but the former consumes a lot of power and the latter requires time synchronization between terminals. In this method, we focus on SF-TDoA, which does not require time synchronization between terminals, to estimate the location of tags. The proposed method is evaluated through computer simulations and proof of concept.
language of the presentation: Japanese
発表題目: IR-UWBを用いた屋内位置推定に関する研究
発表概要: 近年,超広帯域(UWB)無線通信を用いた屋内位置推定システムが検討されている.このシステムは,複数のアンカーを用意し,測位対象のタグとの間の距離および距離差を測定する.主な測位手法として飛行時間(ToF)や到着時間差(TDoA)が挙げられるが,前者は消費電力が大きく,後者は端末間の時間同期を必要とする.本手法ではTDoAの中でも,端末間の時間同期を不要とする測位手法(SF-TDoA)に着目し,距離を測位する. また,推定手法は空間上の誤差勾配を規範とするガウス・ニュートン(GN)法を使用し,提案方式を計算機シミュレーションと概念実証によって評価を行った.
 
白川 琢磨 M, 1回目発表 計算行動神経科学 田中 沙織 佐藤 嘉伸 清川 清
title: Longitudinal Relationship of Adolescent Brain Development, Environmental Factors, and Problem Behaviors
abstract: Adolescence is a critical period marked by significant mental and physical changes. As a result, adolescents are more likely to experience problem behaviors. Previous research has highlighted the importance of considering the interactions among brain development, environmental factors such as child-family relationships, and behavior. However, few longitudinal studies have examined how early adolescent brain development and environmental factors affect later problem behaviors. This study addresses this gap by using longitudinal data from the Tokyo Teen Cohort to investigate the causal relationships between environmental factors, brain activity, and the emergence of problem behaviors over time. By analyzing developmental changes and causal relationships, the research aims to identify key factors that influence adolescent problem behaviors and contribute to improved support for adolescents and guidance for parents.
language of the presentation: Japanese
 
西谷 香紀 M, 1回目発表 計算行動神経科学 田中 沙織 池田 和司 久保 孝富
title: Toward a unified understanding of decision-making: Model-ased and Model-ree strategies in the perceptual-action cycle
abstract: The human brain engages in decision-making through continuous interactions with the environment, a process conceptualized as the perceptual-action cycle. Within this framework, two distinct strategies are employed: model-based and model-free approaches. Examining the differences between these strategies enables a deeper understanding of individual variations in decision-making processes and maladaptive decision-making patterns observed in individuals with mental disorders. Previous research has predominantly utilized computational models that represent these strategies through simultaneous calculations and weighted summations. This study seeks to provide a unified theoretical perspective on the model-based and model-free mechanisms within the context of the perceptual-action cycle.
language of the presentation: Japanese