佐伯 雄飛 | M, 2回目発表 | 知能システム制御 | 杉本 謙二, | 和田 隆広, | 小林 泰介 |
title: Model-Free Reinforcement Learning with Exponential Moving Average Filter for Unknown Object Tracking
abstract: Active devices such as pan/tilt/zoom cameras are becoming more popular.Therefore, control of robots using reinforcement learning, which can automatically acquire optimal behavior in an uncertain environment has attracted a lot of attention.The problem in controlling a robot with reinforcement learning is noise.Noise may prevent you from learning correctly.Although reinforcement learning can handle uncertainty probabilistically, there is a limit to how much noise it can handle, and its learning performance is always degraded.In this study, I utilize model-free and state-dependent decision-making reinforcement learning in filters to reduce the impact of noise. I also explain a framework in which the filter value changes adaptively depending on the situation. language of the presentation: Japanese 発表題目: 未知物体追跡に向けた指数移動平均フィルタを有するモデルフリー強化学習 発表概要: パン・チルト・ズームカメラなどのアクティブデバイスが普及しつつある.不確実性のある環境で最適な行動を自動的に獲得することができる強化学習を用いたロボットの制御に注目が集まっている.強化学習でロボットを制御する上で問題となるのがノイズである.ノイズにより正しく学習できないことが考えられる.不確実性を確率的に扱えるが,ノイズへの対応限界が存在し,学習性能も必ず劣化する.本研究では,モデルフリーかつ状態依存で意思決定する強化学習をフィルタに活用することでノイズの影響を減らす.また状況依存でフィルタ値が適応的に変化する枠組みについて説明する. | |||||
綿貫 零真 | M, 2回目発表 | 知能システム制御 | 杉本 謙二, | 和田 隆広, | 小林 泰介 |
title: Sparse Latent Space Acquisition with Variational Autoencoders Based on Tsallis Statistics
abstract: In machine learning, transforming features into a low-dimensional latent space has advantages such as speeding up learning and suppressing overfitting. In addition, when each feature in the latent space is independent and latent space is sparse, the overlap of information between features can be reduced. Such a sparse latent space can be useful for acquiring a policy of robot with high-dimensional sensors like a camera. To obtain the sparse latent space, a variational autoencoder based on Tsallis statistics is rearranged and analyzed. From vision information on a car racing simulation, the proposed method, which is with a more natural implementation than the previous method, can extract the sparse latent space appropriately. language of the presentation: Japanese | |||||
田原 熙昻 | M, 2回目発表 | 知能システム制御 | 杉本 謙二, | 和田 隆広, | 松原 崇充 |
title: Disturbance-injected Robust Imitation Learning with Task-achievement
abstract: An imitation learning method called DART injects the level-optimized noise to demonstrator's actions to learn robust policies. DART assumes the demonstrator can perfectly accomplish the given task; however, it may not always hold. In this study, we propose a novel imitation learning method based on the degree of task achievement. We confirmed the proposed method could learn robust policies in both simulation and real robot experiments. language of the presentation: Japanese 発表題目: タスク達成度を考慮した教示者に摂動を加えるロバスト模倣学習 発表概要: 模倣学習手法の1つであるDARTは,デモンストレータの行動に最適化した摂動を加えることで,ロバストな行動方策を学習する.DARTは,与えられた全てのタスクをデモンストレータが完璧に達成すると仮定しているが,実際には最適でない場合もある.本研究では,タスク達成度に基づいて,新しい模倣学習手法を提案する.シミュレーションと実機実験を行い,提案法がロバストな行動方策を学習できることを確認した. | |||||