Title: Optimization Aspects of Temporal Abstraction in Reinforcement Learning


Abstract: Temporal abstraction refers to the idea that complicated sequential decision making problems can sometimes be simplified by considering the "big picture" first. In this talk, I will give an overview of some of my work on learning such temporal abstractions end-to-end within the "option-critic" architecture (Bacon et al., 2017). I will then explain how other related hierarchical RL frameworks, such as Feudal RL by Dayan and Hinton (1993), can also be approached under the same option-critic architecture. However, we will see that that this formulation leads to a so-called "bilevel" optimization problem. While this is a more difficult problem, the good news is that the literature on bilevel optimization is rich and many of its tools have yet to be re-discovered by our community. I will finally show how "iterative differentiation" techniques (Griewank and Walther, 2008) can be applied to our problem while providing a new interpretation to the "inverse RL" approach of Rust (1988). 


Bio: Pierre-Luc Bacon is a postdoc in Emma Brunskill's group. He completed his PhD with Doina Precup at McGill University in 2018. His research focuses on temporal abstraction and representation learning in RL.