コロキアムB発表

日時: 9月20日（水）2限目（11:00-12:30）

会場: L1

司会: 北野和哉

WANG SHUYUAN	D, 中間発表	ロボットラーニング	松原　崇充, 池田　和司, 佐々木　光
title: Multimodal behavior cloning by overlapping mixture non-Gaussian Gaussian process abstract: This work proposes a novel method called non-Gaussian overlapping mixtures of Gaussian processes (NGOMGP) to learn a complex policy model for real-world robotics tasks with reasonable data size. NGOMGP models complex distribution with the multimodal regression model, sparse overlapping mixtures of Gaussian processes (SPOMGP), and distribution transformation by the neural ordinary differential equations (neural-ODE). We apply NGOMGP to learn the control policy for the robot task by imitation learning. Moreover, we investigate the influence of the hyperparameters of NGOMGP on the computation time and performance of the policy. The performance of the proposed method is tested by two simulations. language of the presentation: English

照岡　肇	M, 2回目発表	ロボットラーニング	松原　崇充, 池田　和司, 鶴峯　義久, 花田　研太
title: Multi-task reinforcement learning following linear temporal logic with task regressions abstract:In existing research on reinforcement learning (RL) using linear temporal logic (LTL), there is a constraint that LTL must progress its formula when the most recent task has finished to preserve the property of the reward function based on the most recent state and action only. Consequently, it has become challenging to complete tasks in environments where unexpected disturbances could occur. We propose a method that uses RL to train a policy for optimal LTL regression and a policy that follows LTL together . Our results indicate that the proposed method outperforms existing research in environments with unexpected disturbances. 発表題目:タスクの後退を考慮した線形時相論理に従うマルチタスク強化学習の提案発表概要: モバイルマニピュレータによる片付け作業において，移動しながらの把持行動を導入することで効率性の改善が期待される。従来，多様時相論理（LTL）を用いた強化学習では, 既存の研究でマルコフ性を保持するためにLTLは常にタスクの進行を続ける制約が存在していた. その結果, 予期せぬ外乱が発生するような環境ではタスクの完全な達成が困難という課題があった. 本研究では, 強化学習を用いて適切なLTLのやり直しを行う方策とLTLに従った方策を同時に学習させる手法を提案する.

西浦　直哉	M, 2回目発表	ロボットラーニング	松原　崇充, 安本　慶一, 鶴峯　義久, 佐々木　光
title: Robust Adversarial Reinforcement Learning for Teleoperation of Quadruped Robot abstract: There are increasing expectations for the use of quadruped robots for tasks that require the use of their legs. Teleoperation using human's high cognitive ability is a promising approach to realize flexible leg work, but it is a problem because it requires posture stabilization of the quadruped robot for the work at the same time. In this study, we propose anrobust adversarial reinforcement learning for teleoperation, in which the posture stabilization policy is considered the protagonist and the teleoperation behavior is considered the adversary. Specifically, the leg subject to teleoperation is learned as an adversarial agent against the posture stabilization. For that adversarial policy, we introduce an additional reward to adapt it to the teleoperation. As a validation, we conducted learning experiments with multiple patterns of Gaussian noise injected into the ideal trajectory and compared the performance with that of robust adversarial reinforcement learning. language of the presentation: Japanese 発表題目: 四足歩行ロボットのテレオペレーションのための敵対的ロバスト強化学習発表概要: 四足歩行ロボットの活用として,脚を用いた作業に対する期待が高まっている.人の高い認知能力を利用するテレオペレーションは柔軟な脚作業を実現する上で有望なアプローチであるが,作業に伴う四足歩行ロボットの姿勢安定化が同時に必要になる点が問題である.本研究では,姿勢安定化方策を主人公とし,遠隔操作行動を敵対者とみなし,テレオペレーションのための敵対的ロバスト強化学習を提案する.具体的には,遠隔操作の対象となる脚を,姿勢安定化に対する敵対的なエージェントとして学習する.また,その敵対的方策に対して,遠隔動作に適応させる報酬を追加で導入する.検証として,理想軌道にガウスノイズを注入した複数パターンの学習実験を行い,敵対的ロバスト強化学習との性能比較を行った.