Colloquium A

日時(Date)	平成30年7月9日（月）3限（13:30--15:00) Mon. July 9th, 2018, 3rd Period (13:30--15:00)
場所(Location)	L1
司会(Chair)	吉野幸一郎助教　(Assist. Prof. Koichiro Yoshino)
講演者(Presenter)	内部　英治 (国際電気通信基礎技術研究所脳情報通信総合研究所脳情報研究所　ブレインロボットインタフェース研究室主幹研究員)
題目(Title)	Forward and inverse reinforcement learning and generative adversarial formulation
概要(Abstract)	Reinforcement Learning (RL) is a computational framework for finding an optimal policy through the interaction with the environment, and it has been achieved several remarkable successes in a broad range of game-playing and control tasks. However, it is also known that RL is sample inefficient; it typically requires significantly more playing experiences than human. One possible reason is that a reward function that is designed by experimenters does not guide learning agents effectively. In practice, it is quite difficult to design an appropriate reward function that encourages the behaviors we want while still be possible to learn. To tackle this problem, inverse RL addresses the general problem of recovering a reward function that explains the expert behaviors by observing a set of demonstrations. Recently, there have been works connecting forward and inverse RL and generative adversarial networks, which have showed remarkable success in various domains such as image generation, video prediction, and machine translation. In this view, the goal of inverse RL is to tell whether experiences are drawn from the expert or generated by forward RL while forward RL tries to generate experiences that are not distinguishable by inverse RL. In this talk, I will briefly introduce this framework and several related methods. Then, I show our method in which forward and inverse RL share some network structures. Inverse RL is implemented by a logistic regression in which the classifier is represented by the reward function and a state value function. Forward RL is derived from the same assumption used by inverse RL and the algorithm is one of soft-RL framework. We demonstrate that our method is comparable with the state-of-the-art methods in performance and more computational efficient.
講演言語(Language)	English
講演者紹介(Introduction of Lecturer)	2015年10月 - 現在国際電気通信基礎技術研究所脳情報研究所ブレインロボットインタフェース研究室主幹研究員 2008年10月 - 2015年9月沖縄科学技術大学院大学神経計算ユニットグループリーダー 2005年10月 - 2008年9月沖縄大学院大学先行的研究事業神経計算ユニットグループリーダー 2004年4月 - 2005年9月 JST 沖縄新大学院大学先行的研究プロジェクト神経計算ユニットグループリーダー 2003年5月 - 2004年3月国際電気通信基礎技術研究所脳情報研究所計算神経生物学研究室研究員 2001年10月 - 2003年4月国際電気通信基礎技術研究所人間情報科学研究所第三研究室研究員 2001年4月 - 2001年9月 JST ERATO 川人学習動態脳プロジェクト研究員 1999年4月 - 2001年3月 JSPS未来開拓学術研究推進事業分散協調視覚による動的3次元状況理解研究員 1999年3月大阪大学大学院工学研究科電子制御機械工学専攻博士後期課程修了