コロキアムB発表

日時: 9月20日（水）4限目（15:10-16:40）

会場: L1

司会: 劉海龍 (Hailong LIU)

大宮　拓馬	M, 2回目発表	ロボットラーニング	松原　崇充, 和田　隆広, 鶴峯　義久, 佐々木　光
title: Study on Iterative Learning Framework for Human-Robot Collaboration abstract: With the decreasing workforce population, there is an expectation to automate tasks traditionally performed by humans, including manual labor, using robots. However, completely replacing tasks that require coordination among multiple construction machines with robots presents challenges from a societal implementation perspective. It is crucial to develop systems that enable collaborative work between humans and robots in environments where they coexist. This research proposes a learning framework for a human and a robot to collaborate on the task with the aim of achieving this goal. Specifically, we combine reinforcement learning policy that learn from trial-and-error experience data with human collaboration policy to acquire collaborative behaviors. However, having humans present throughout the reinforcement learning process can be costly. Therefore, during the learning process, we use imitation learning policies acquired from human-provided instruction data in place of human actions. An important feature of this approach is that as reinforcement learning policy are updated, the required collaborative behaviors from human also change, leading to a collaborative gap. To mitigate this gap, we iteratively collect collaborative instruction data and update reinforcement learning policy, reducing the collaborative gap. As an evaluation experiment, we conducted preliminary experiments in a simulation environment to acquire collaborative behaviors between robots controlled by reinforcement learning and human (or imitation learning policy). language of the presentation: Japanese 発表題目: 人ロボット協調動作の反復学習に関する研究発表概要: 労働力人口の減少に伴い、土工作業をはじめとする人が行う作業をロボットを用いて自動化することが期待されている。しかし、土工作業のように複数の建設機械が協調する作業を全てロボットで置き換えることは社会実装の観点で容易ではなく、人とロボットが混在する環境で協調作業を行えるシステムを構築することが重要と言える。本研究では、この目的を達成するために人とロボットが協調して作業を行う学習フレームワークを提案する。具体的には、試行錯誤の経験データから方策を学習する強化学習方策に対して、協調する人方策と共に学習させることで、人との協調動作を獲得する。ただし、強化学習の学習プロセスに常に人が存在することはコストが大きいため、学習過程では人が与えた教示データから獲得された模倣学習方策を人の代わりに用いる。重要な特徴として、強化学習方策の更新に伴って人に要求される協調動作も変化する協調ギャップが発生するため、協調の教示データの収集と強化学習方策の更新を反復的に行い、協調ギャップを低減する。評価実験として、シミュレーション環境で強化学習で制御されるロボットと人（または模倣学習方策）の協調動作を獲得する予備実験を行った。

東　崇史	M, 2回目発表	ロボットラーニング	松原　崇充, 和田　隆広, 花田　研太, 佐々木　光
title: Model-Based Reinforcement Learning for Target Grasping by Underwater Drones abstract: In recent years, underwater drones have been used for garbage collection, sunken ships, and marine resource exploration. However, most of these applications are operated remotely by humans. To solve actual problems in the ocean, underwater drones are required to acquire object manipulation skills and be used automatically. In this study, an underwater drone is used to automatically grasp a target object. The proposed system has two major components: an approximate Gaussian process using randomized Fourier features to learn the dynamics of the underwater environment, and model predictive control to perform accurate control. We have conducted movement control experiments and target grasping experiments in simulation and real environments using the combined system. language of the presentation: Japanese 発表題目: 水中ドローンによる目標物把持のためのモデルベース強化学習発表概要: 近年では、ごみ収集や沈没船、海洋資源探索などにおいて、水中ドローンの活用が進められつつある。しかしそれらの活用においては人が遠隔操作することがほとんどであり、実際の海洋における問題解決には水中ドローンが物体操作スキルを獲得し、自動で利用されることが求められる。そこで本研究では、水中ドローンにおいて自動で目標物把持を行う。提案されるシステムにおいては、大きく二つの要素があり、水中環境のダイナミクスを学習するための乱択化フーリエ特徴による近似ガウス過程と精度の高い制御を行うためのモデル予測制御である。これらを組み合わせたシステムによりシミュレーションと実環境において、移動制御実験と目標物把持実験を行ったので、それを紹介する。

内藤　優星	M, 2回目発表	ロボットラーニング	松原　崇充, 和田　隆広, 花田　研太, 鶴峯　義久
title: Hierarchical Multi-Agent Reinforcement Learning with Dynamic Task Priority for Multi-Robot System abstract: The collaboration of multiple robots in performing tasks is expected to contribute to the execution of tasks that cannot be achieved by a single robot and to improve work efficiency. In recent years, multi-agent deep reinforcement learning (MARL) has been studied as a method to acquire cooperative strategies for multiple agents. However, when applying this to robots, not only decisions like task allocation but also robot control, such as command values to motors, become challenges. For example, in the transportation of multiple objects, considerations need to be made at multiple stages, such as which object each robot should transport and how robots should combine their forces when multiple robots carry the same object. Furthermore, decisions like "giving up" on a task in progress may be necessary when the object is excessively heavy for the number of robots. In this study, we propose a dynamic task-priority hierarchical multi-agent deep reinforcement learning to address these challenges. This method, based on one of the state-of-the-art MARL techniques, MAPPO, is extended so that each agent has a neural network for decision-making, a neural network for robot control, and dynamic task priority. The dynamic task priority is operated by the decision-making network and communication with other agents, and the highest-priority task is executed by the robot control network. We apply the proposed method to object transportation tasks and report verification results from both simulations and real robots. language of the presentation: Japanese 発表題目: マルチロボットシステムのための動的タスク優先度付き階層型マルチエージェント強化学習発表概要: 複数のロボットが協調して作業を行うことは，単一のロボットでは達成できないタスクの実行や作業効率の向上に寄与すると期待されている．近年，複数エージェントの協調方策を獲得する手法としてマルチエージェント深層強化学習（MARL）が研究されているが，これをロボットに応用する場合にはタスク割当のような意思決定と同時に，モーターへの指令値のようなロボット制御も課題となる．例えば複数物体の搬送作業においては，各ロボットがどの物体を搬送するべきか，複数のロボットが同じ物体を運ぶ場合はどのように力を合わせるべきか，など多段階で協調を考慮する必要がある．さらに，ロボット数に対して物体が過度に重い場合は実行中のタスクの「諦め」のような意思決定も必要となる．本研究ではこれらの課題に対応するため，動的タスク優先度付き階層型マルチエージェント深層強化学習を提案する．この手法はMARLの最先端手法の一つであるMAPPOをもとに拡張され，各エージェントが意思決定用のニューラルネットワーク，ロボット制御用のニューラルネットワーク，そして動的タスク優先度を持つ．動的タスク優先度は意思決定用ネットワークと他のエージェントとの通信により操作され，最も優先度の高いタスクがロボット制御用ネットワークにより実行される．提案手法を物体の搬送作業に適用し，シミュレーションと実ロボットによる検証結果を報告する．

佐藤　誠人	M, 2回目発表	ロボットラーニング	松原　崇充, 和田　隆広, 鶴峯　義久, 佐々木　光
title: Robust Task and Motion Planning using Residual Reinforcement Learning for Contact-Rich Tasks abstract: In recent years, with advancements in robotics technology, there has been a growing use of robots for tasks that extend beyond simple operations in settings such as residential environments. To accomplish long-term tasks in such complex environments, research in robot motion planning, known as Task And Motion Planning (TAMP), has gained significant attention. While TAMP provides the capability for long-term motion planning, it faces challenges when applied to tasks involving numerous contacts in real-world environments, where modeling errors and sensor noise can lead to task failures.　This study aims to develop a method for absorbing and correcting modeling errors related to contact, friction, physical properties of objects, and sensor noise in the trajectories generated by TAMP. The objective is to ensure high-reliability robot operations in real-world settings. To achieve this correction method, we employ iterative residual reinforcement learning, demonstrating its effectiveness through the assembly task of a gearbox, which involves numerous contacts with the environment. language of the presentation: Japanese 発表題目: Contact-Richマニピュレーションタスクにおける残差強化学習を用いたRobust Task And Motion Planning 発表概要: 近年，ロボット技術の進歩に伴い，従来の工場などでの単純な作業から，居住環境などの複雑な場所での長期間にわたるタスクを遂行するためのロボットの活用が広がっている．このような長期的なタスクを実現するため，Task And Motion Planning（TAMP）として知られる，ロボットの動作計画研究が盛んに行われています．TAMPは長期的な動作計画を提供する能力を持つ一方で，実際の環境で多くの接触が発生するタスクにおいて，モデル化誤差やセンサーノイズの影響によりタスクの失敗が課題とされている．本研究では，TAMPを使用して生成される軌道において，接触，摩擦，物体の物理特性に起因するモデル化誤差，およびロボットのセンサーノイズを吸収し，補正する手法を開発することを目的とした．これにより，実際の環境でも信頼性の高いロボット動作を実現する．本研究では，この補正手法を実現するために，試行錯誤的な残差強化学習を活用し，環境との接触を多く含むギアボックスの組み立てタスクを対象にその有効性を検証した．