コロキアムB発表

日時: 9月20日(木)1限(9:20~10:50)


会場: L1

司会: Gustavo Garcia
青谷 拓海 M, 2回目発表 知能システム制御 杉本 謙二, 小笠原 司, 松原 崇充, 小林 泰介
title: Bottom-up Multi-agent Reinforcement Learning for Selective Cooperation
abstract: Applications of multi-agent system like cooperative transport are found in various domains of real world. Due to the complexity inherent in multi-agent system, however, handling with preprogramming is difficult. Multi-agent reinforcement learning (MARL), which is a framework to make multiple agents in the same environment learn their policies simultaneously using reinforcement learning, is receiving attention. In the conventional MARL, although decentralization is essential for feasible learning, rewards for the agents have been allocated from a centralized system in the environment. Instead of such ``top-down'' MARL, to achieve the completely distributed autonomous systems, we tackle a new paradigm named ``bottom-up'' MARL, where the agents get their own rewards. The bottom-up MARL requires to share the respective rewards for emerging orderly group behaviors, which cannot be acquired merely by maximizing the mean of them. We therefore propose the architecture that has three components: estimating rewards of other agents; selecting rewards to reinforce from the correlation, and; promoting the exploration to find unknown correlation. The proposed architecture is verified that every element is essential by numerical simulation performed in stages. A similar task is also accomplished in dynamical simulation under the same conditions as the actual robots.
language of the presentation: Japanese
 
金子 拓光 M, 2回目発表 知能システム制御 杉本 謙二, 小笠原 司, 松原 崇充, 小林 泰介
title: A Data-driven Approach for Waste Incineration Plant using Kalman Variational Auto-Encoders
abstract: In a waste incineration plant, an automatic operation system capable of stably burning waste and generating electricity is desired. For automatic control of plants, it is necessary to identify hidden dynamic models from time series data. However, due to the uncertainty of the waste, plant data is high-dimensional and complicated, so identification by first principles modeling is difficult. Therefore, we constructed a data driven time series modeling method using KVAE which is a system identification method using deep learning. In this presentation, we introduce the comparison result of the prediction performance of the constructed prediction model and LSTM.
language of presentation: Japanese
 
佐々木 光 M, 2回目発表 知能システム制御 杉本 謙二, 小笠原 司, 松原 崇充, 小林 泰介
title: Multimodal Policy Search using Overlapping Mixtures of Sparse Gaussian Process Prior
abstract: We present a novel policy search reinforcement learning algorithm that can deal with multimodality in optimal control policies based on Gaussian processes. Previously proposed non-parametric policy search algorithms employ Gaussian process (or kernel) regression for a control policy model. Although such models can be trained by a reasonable amount of training data even with a high-dimensional sensor input, they cannot capture the multimodality in policies, which is often required for various robotics tasks. We explore an alternative approach which employs Overlapping Mixtures of Gaussian Processes (OMGPs) for a control policy, in which all the GPs in the mixture are global and overlapped in the input space. Then, we derive a novel policy search algorithm based on variational Bayesian inference. To validate the effectiveness of our algorithm, we applied it to two typical robotic tasks in simulation. Simulation results demonstrate that our algorithm can efficiently learn multimodal policies even with a high dimensional sensor input.
language of the presentation: Japanese
 
杉野 峻生 M, 2回目発表 知能システム制御 杉本 謙二, 小笠原 司, 松原 崇充, 小林 泰介
TItle: Continual Learning using Modularity of Structured Reservoir Computing
Abstract: In recent years, neural network (NN) has been established as a powerful solver for the problems that are difficult to analytically solve due to their complexity. NN, however, has a critical problem, called catastrophic forgetting: memories for tasks already learned are easily overwritten with memories for a task additionally learned. This problem interferes with continual learning required for autonomous robots. The catastrophic forgetting is supposed to be caused by overwriting the whole network by backpropagation. This study therefore develop a network to mitigate the catastrophic forgetting using a model that learns only the output layer of recurrent NN, named reservoir computing (RC). Instead of the random network in the general RC, the network structure and its weights are designed based on a fractal complex network. This network modularized memories for multiple tasks, thereby mitigating the catastrophic forgetting.
Language of the presentation: Japanese