NAIST-IS-MT0651204: RODRIGUES ALAN DE SOUZA

Model-Free and Model-Based Reinforcement Learning Strategies in the Acquisition of Sequential Behaviors

Alan de Souza Rodrigues (0651204)

Humans can learn novel behaviors from the scratch or utilize the knowledge acquired from past experiences to facilitate and speed up the learning process. These two types of learning can be regarded as knowledge-free and knowledge-based learning, respectively. Reinforcement Learning (RL), a computational theory of adaptive optimal control, suggests two methods for action selection that fit well to these types of learning: Model-Free (MF) method that uses the predicted future rewards for reinforcing executed actions, and Model-Based (MB) method that uses a forward model to predict the future states reached by hypothetical actions. We developed a new sequential action selection task paradigm in order to investigate how and when humans utilize MF and MB strategies and where in the brain they are implemented. In task condition 1, subjects performed previously well-learned action sequences, in task condition 2 subjects could use learned key-map rules to plan new action sequences and in task condition 3 subjects had to simultaneously learn a new key-map and action sequences. We tested subjects inside the fMRI scanner and in this thesis we report the behavioral results. The analysis of subjects’ performance measures in light of RL theory suggests that subjects utilized specific action selection strategies for each task condition: sequential motor memory for task condition 1, MB strategy for task condition 2 which benefitted especially when subjects were provided with a delay period for planning and preparation of actions, and though not fully clear, MF strategy was utilized in task condition 3. The results imply that humans can flexibly utilize both MF and MB action selection strategies depending on the degree of prior knowledge and the time available for execution.