コロキアムB発表

日時: 11月10日（Mon） 2限目（10:30-12:00）

会場: A504

司会: 未定

TRAN VAN DUY	M, 2回目発表	コンピューティング・アーキテクチャ	中島　康彦	林　優一	張　任遠	KAN Yirong	PHAM HOAI LUAN	Le Vu Trung Duong
title: HPQEA: A Scalable and High-Performance Quantum Emulator with High-Bandwidth Memory for Diverse Algorithms Support abstract: In recent years, there has been a growing interest in the development of quantum emulation. However, existing studies often struggle to achieve broad applicability, high performance, and efficient hardware resource utilization. To address these challenges, we provide a quantum emulator based on the state-vector emulation approach named High-Performance Quantum Emulation Accelerator (HPQEA). HPQEA includes three main features: high-performance computing cores, an optimized controlled-NOT gate computation strategy, and effective utilization of high-bandwidth memory. Verification and evaluation on the Alveo U280 board show that HPQEA can emulate quantum circuits with up to 30 qubits while maintaining high fidelity and low mean square error. It outperforms comparable FPGA-based systems by producing faster execution, supporting a wider range of algorithms, and requiring low hardware resources. Furthermore, it exceeds the Nvidia A100 in normalized gate speed for systems with up to 20 qubits. These results demonstrate the scalability and efficiency of HPQEA as an effective hardware design for emulating various quantum algorithms. language of the presentation: English

BUI CAO DOANH	D, 中間発表	コンピューティング・アーキテクチャ	中島　康彦	林　優一	張　任遠	KAN Yirong	PHAM HOAI LUAN	Le Vu Trung Duong
title: Contributions of Deep and Continual Learning to Whole Slide Image Analysis abstract: Digital computational pathology has rapidly evolved to support cancer diagnosis and prognosis. However, Whole Slide Images (WSIs) are extremely large, gigapixel-scale data that present challenges for transfer and storage. With the continuous emergence of new cancer-related tasks, retraining a separate model for each task becomes inefficient and unsustainable in terms of scalability. To address this, continual learning methods have been introduced in computational pathology, enabling models to learn sequentially from new data without forgetting previously acquired knowledge. Nonetheless, existing rehearsal-based and regularization-based continual learning approaches face several limitations. First, rehearsal methods require storing old samples, which conflicts with data privacy and storage constraints. Second, most methods assume a fixed number of classes per task, which contradicts the open-ended nature of clinical workflows where classes (e.g., cancer types) are defined by experts or pathologists. Third, they typically assume that all training data are co-located on a single node, which restricts multi-institutional collaboration due to privacy and data transfer limitations. With the emergence of vision-language pathology foundation models (VLMs), information from textual annotations can be leveraged to enhance WSI representations. This thesis develops an efficient continual learning framework based on pathology VLMs, focusing on three key aspects: (1) leveraging VLMs to enable more effective lifelong learning on WSIs; (2) comparing zero-shot VLM performance with training-based continual learning methods; and (3) introducing a buffer-free, distributed approach that reduces inter-institutional data transfer, protects privacy, and facilitates the creation of a unified model for WSI analysis. language of the presentation: English

朝比奈　甲樹	D, 中間発表	コンピューティング・アーキテクチャ	中島　康彦	林　優一	張　任遠	KAN Yirong	PHAM HOAI LUAN	Le Vu Trung Duong
title: Acceleration of Sparse Matrix Operations for GNN Inference with IMAX-SpMM and Memory Grouping Optimization for SoCs Based on DBSCAN and Rectangular Partitioning abstract: This study proposes IMAX-SpMM, an acceleration method for Sparse Matrix-Matrix Multiplication (SpMM), which is the main performance bottleneck in Graph Neural Network (GNN) inference. By using a Coarse-Grained Linear Array (CGLA) architecture, the proposed approach coordinates DMA transfer and computation on the IMAX3 accelerator. It automatically detects continuous address regions and integrates multiple DMA operations to reduce transfer frequency and maximize data reuse. In addition, a variable-length instruction mapping mechanism dynamically adjusts the instruction sequence according to the sparsity pattern of each dataset, reducing unnecessary padding. Evaluation with Graph Convolutional Networks (GCNs) shows 5.65 times speedup over a CPU, 3.59 times over a GPU, and up to 1,150 times improvement in energy efficiency, demonstrating the effectiveness of data-flow optimization on CGLA for GNN inference. In a separate study, we address the efficiency of Memory Built-In Self-Test (MBIST) in large-scale SoC design. We propose a memory-grouping optimization method based on DBSCAN clustering combined with rectangular partitioning, which automatically determines the number of clusters while satisfying physical constraints such as placement and clock domains. The proposed method achieves both uniformity and minimum group count by recursively dividing regions according to design constraints such as element count, distance, and aspect ratio. Evaluation using SRAM placement data based on the ASAP7 PDK shows that the method reduces the number of groups by up to 48 percent and speeds up computation by about 87 times compared with conventional approaches. language of the presentation: Japanese

桑原　拓海	D, 中間発表	コンピューティング・アーキテクチャ	中島　康彦	林　優一	張　任遠	KAN Yirong	PHAM HOAI LUAN	Le Vu Trung Duong
title: Ultra Low Latency Spiking Locally Competitive Algorithm for Spatio-temporal Data and Hierarchical Networks abstract: Dynamic Vision Sensors output noisy asynchronous spikes. The Spiking Locally Competitive Algorithm (S-LCA), inspired by the visual cortex, denoises efficiently but requires long-timestep Hebbian learning. Our previous research, Modern S-LCA (MS-LCA), adopts Backpropagation Through Time for fast convergence, but its dictionary does not encode temporal information and therefore can only handle static images. We propose Dynamic MS-LCA (DMS-LCA), which introduces temporal dictionaries and therefore supports direct sparse modeling of DVS data. Furthermore, by feeding MS-LCA spikes into DMS-LCA, our method enables hierarchical processing. Unlike conventional LCA/LASSO that rely only on final solutions, DMS-LCA leverages the convergence dynamics themselves. Experiments on N-MNIST demonstrated controllable sparsity (54.6-99.8%), and on POKER-DVS Synaptic Operations (SynOps) were reduced by 92%. A two-stage MS-LCA + DMS-LCA model reduced CIFAR-10 reconstruction error by 42% and SynOps by 28% versus S-LCA. These results show that DMS-LCA is an efficient framework for event-based sparse modeling. language of the presentation: Japanese