新任助教講演会(Lectures from New Assistant Professors)

日時(Datetime) 令和4年6月24日(金)3限 (13:30 -- 15:00), 2022/06/24, Friday
場所(Location) エーアイ大講義室(L1)
司会(Chair) 花田 研太(Kenta Hanada)

講演者(Presenter) 鶴峯 義久(Yoshihisa Tsurumine), ロボットラーニング研究室 (Robot Learning Lab.)
題目(Title) Deep Reinforcement Learning with Smooth Policy Update for Robotic Cloth Manipulation
概要(Abstract) Deep Reinforcement Learning (DRL), which can learn complex policies with high-dimensional observations as inputs, e.g., images, has been successfully applied to various tasks. Therefore, it may be suitable to apply them for robots to learn and perform daily activities like washing and folding clothes, cooking, and cleaning. However, there are only a few studies that have applied DRL to real robots remain. In this study, we apply DRL to cloth manipulation tasks that are part of daily tasks and focus on two main problems: (1) generating a huge number of samples in a real robot system is arduous because of the high sampling cost, and (2) learning environments require a reward function to evaluate selected behaviors, but designing rewards for cloth with flexible shapes is challenging. The first motivation is to apply sample efficient DRL to real robotic cloth manipulation tasks. We employ a smooth policy update to enable stable learning from a small number of samples. The second motivation is to learn a cloth manipulation policy without explicit reward function design. This study explores an approach of Generative Adversarial Imitation Learning (GAIL) for robotic cloth manipulation tasks, which allows an agent to learn near-optimal behaviors from expert demonstration and self explorations without explicit reward function design. Proposed methods were first investigated by a robot-arm reaching task in the simulation. In a real robot experiment, we applied two real robotic cloth manipulation tasks: 1) flipping a handkerchief and 2) folding clothes. Proposed methods were evaluated by flipping a handkerchief task with a hand-engineered reward function and were investigated by folding clothes tasks similar to daily tasks.

講演者(Presenter) 佐々木 光(Hikaru Sasaki), ロボットラーニング研究室 (Robot Learning Lab.)
題目(Title) Bayesian Process Policy Search with Latent Variables in Real-world Environments
概要(Abstract) Policy search reinforcement learning has been drawing much attention as a method for learning robot control. In particular, policy search using Gaussian process regression as the policy model can learn optimal actions. However, it is difficult to naively apply Gaussian process policy search to real-world tasks because real-world task environments, such as robotics, often involve various uncertainties. This is because the uncertainty of the environment often requires a very complex state-to-action mapping to obtain high-performance actions. To overcome complexity, this study focuses on the idea of latent variable modeling. This study explores a latent variable modeling approach in Gaussian process policy search to capture the data complexity sampled in uncertain environments. Then an algorithm is derived and simultaneously performs latent variable inference and policy learning and aims to make policy search applicable to various real-world tasks with uncertainty. We focus on two complexities: 1) multiple optimal actions emerging from a reward function with ambiguous specification, and 2) weak observations from the environment that contain little information about the state. We designed a policy model for each complexity by introducing latent variables into the Gaussian process and derived the policy update schemes based on variational Bayesian learning. The performance of the proposed policy search method was verified by simulation and a task using a robot manipulator. Finally, a policy learning framework based on Bayesian optimization with latent variables is proposed for application to actual heavy machinery, and its performance was verified using a real waste crane.