コロキアムB発表

日時: 9月19日 (木) 3限目(13:30-15:00)


会場: L1

司会: 江口 僚太
朝比奈 甲樹 M, 2回目発表 コンピューティング・アーキテクチャ 中島 康彦 林 優一 張 任遠 KAN Yirong PHAM HOAI LUAN
title: An Energy-Efficient Aggregator for Graph Attention Networks with CGLA
abstract: Graph Attention Networks (GATs) have brought a breakthrough in the field of Graph Neural Networks (GNNs). Although many GATs-derived models are still being developed, the challenge is the large number of matrix products, especially Sparse Matrix-Matrix Multiplication (SpMM), in the Aggregation operation, Although GPGPUs and dedicated accelerators can improve the efficiency of SpMM operations, the huge power consumption of GPGPUs makes them unsuitable for implementation in edge devices, GPGPUs are not suitable for implementation in edge devices. In addition, since GNNs are closely related to applications, implementation on dedicated accelerators is also not suitable. We propose an energy-efficient inference process for GATs using a Coarse-Grained Linear Array (CGLA) with linear interconnections, called IMAX3, and an IMAX-specific SpMM kernel. IMAX3 avoids reading and writing results to DDR and enables efficient computation and efficient execution of GATs models.
language of the presentation: Japanese
 
上谷 仁亮 M, 2回目発表 コンピューティング・アーキテクチャ 中島 康彦 林 優一 張 任遠 KAN Yirong PHAM HOAI LUAN
title: Implementation and Evaluation of LLM on a CGLA
abstract: Generative AI services like ChatGPT are currently attracting global attention. Concurrently, the shortage of process- ing resources like GPUs and the increasing power demand have become significant challenges, highlighting the critical need to balance processing performance and power efficiency. This study evaluated the performance of Large Language Models (LLMs) using the IMAX3 prototype, which is based on a Linear Array Coarse-Grained Reconfigurable Architecture (CGLA) proposed by a research group. IMAX3 was implemented on a Field Programmable Gate Array (FPGA), and its processing speed and power efficiency were compared with other computing platforms, including CPUs. During the evaluation, improvements were made, such as adding a conversion table from 4-bit integers to single-precision floating-point numbers in the IMAX3 floating- point unit. As a result, GGML, a library for running LLMs on CPUs, was successfully run on IMAX3. The computation time ratio reached 80%, demonstrating the potential of CGLA as a viable computing platform for LLMs.
language of the presentation: Japanese
 
桑原 拓海 M, 2回目発表 コンピューティング・アーキテクチャ 中島 康彦 林 優一 張 任遠 KAN Yirong PHAM HOAI LUAN
title: A Directly-Trained Spiking Locally Competitive Algorithm for Ultra-Fast LASSO Solver
abstract: Neuromorphic computing, which is inspired by the dynamics of biological nervous systems, has emerged as a novel computing paradigm capable of efficiently performing not only neural networks for cognitive tasks but also non-cognitive tasks such as sparse modeling and graph search. Spiking Locally Competitive Algorithm (S-LCA) is an efficient LASSO solver with order-of-magnitude advantages in terms of power consumption and latency. However, conventional S-LCA requires a significant number of timesteps to converge, making it difficult to connect with the latest spiking neural networks for image recognition that can be performed in relatively few timesteps or difficult to use as preprocessing for event-based data. Therefore, we propose an S-LCA trained by Backpropagation Through Time for ultra-low latency LASSO. In our proposed model, L2 Norm and Batch Normalization are used to improve accuracy. Through validation using image datasets, we achieved a significant reduction in timesteps, demonstrating the potential for fast operation and low energy consumption. Moreover, the results show that the proposed algorithm achieves accuracy comparable to non-spiking analog LCA.
language of the presentation: Japanese