コロキアムB発表

日時: 9月18日(金)3限(13:30~15:00)


会場: L1

司会: 日永田 智絵
武田 敏季 M, 2回目発表 知能システム制御 杉本 謙二, 池田 和司, 小林 泰介
title: Learning of Hidden Markov Dynamical System for Long-Term Prediction
abstract: To learn dynamical system, an important challenge is to enable long-term prediction to be both accurate and transparent. Variational autoencoder (VAE) is one of the promising approaches to resolve this problem since it has a capability to learn latent space hidden in complex data, and that space can be utilized as dynamical systems. Until now, however, models trained in the latent space are less accurate since the conventional methods solve optimization problems without explicitly taking into account long-term prediction or without Markov property. This study, therefore, proposes a method for achieving long-term prediction through a Markov model in latent space. Specifically, the variational lower bound of log-likelihood for the observed series data is derived. It is maximized by training the neural network. The comparative analysis demonstrates the performance of the prediction delivered by the proposed model.
language of the presentation: Japanese*
 
ILBOUDO WENDYAM ERIC LIONEL M, 2回目発表 知能システム制御 杉本 謙二, 池田 和司, 小林 泰介
title: Robust Stochastic Gradient Descent optimization with Student-t based momentum
abstract: Remarkable achievements of deep neural metwork models stand on the development of excellent stochastic gradient descent methods. Deep-learning-based machine learning algorithms, however, have to find patterns between some observations and some supervised signals, even though they may include some noise that hides the true relation existing between the two. To perform well even with such noise, we expect machine learning models to be able to detect outliers and discard them when needed. We therefore propose a new method to improve the robustness of stochastic gradient optimization algorithms to abnormal values (outliers), using the robust student-t distribution as the core idea. We integrate our method to some of the latest stochastic gradient algorithms, and in particular, Adam, the popular optimizer, is modified through our method. The resultant algorithm, called t-Adam, along with the other stochastic gradient methods integrated with our method, are shown to effectively outperform Adam and their original versions in terms of robustness against noise and outliers on diverse tasks, ranging from regression and classification to reinforcement learning problems.
language of the presentation: English
 
角川 勇貴 M, 2回目発表 知能システム制御 杉本 謙二☆, 池田 和司, 松原 崇充(特任准教授)
title: Binarized P-Network for reinforcement learning of real-time robot control with FPGA
abstract: Recently, FPGAs have been attracting attention as power-saving and high-speed computers. In this study, we propose a deep reinforcement learning method suitable for FPGAs to control edge robots in real-time and with low power consumption. In order to implement a neural network for approximating a policy function in an FPGA with limited memory resources, this method binarizes the operations and the parameters for downsizing suitable for FPGA implementation. We also apply the Gap Increasing Operator to update the policy to improve robustness against degradation of function approximation performance at downsizing. We applied this method to a simulation task and confirmed that this method is robust to degradation of function approximation performance due to downsizing of a neural network. Therefore, it is expected that real-time and power-saving robot control will be possible because the policy obtained from deep reinforcement learning can be implemented in FPGAs by this method.
language of the presentation: Japanese
 
山之口 智也 M, 2回目発表 知能システム制御 杉本 謙二☆, 池田 和司, 松原 崇充(特任准教授)
title: Learning Disentangled Latent Dynamics using Domain Randomization for Sim-to-real Policy Adaptation
abstract: Real-world robot learning for visual robotic manipulation is time-consuming to collect many images and requires human intervention. The promising approach to solve this problem is to adopt the policy learned in simulation to the real world. However, the vision data and the dynamics in the simulation are different from that of the real world, and therefore the policy needs to overcome these reality gap. Additionally, the policy with a fewer number of adaptive parameters is desirable because the many real-world data are required to adapt policy if the policy has many adaptive parameters such as the weights of neural networks. We tackle the vision and dynamics reality gap by learning disentangled latent dynamics using domain randomization in this work. Then, we model the adaptive parameters of policy with a small number of dynamics-related parameters to enable adaptation of policy with only a few real-world data.
language of the presentation: Japanese