ISL Colloquium

← List all talks ...

Breaking the Sample Size Barrier in Reinforcement Learning

Yuxin Chen – Associate Professor, University of Pennsylvania

Thu, 6-Apr-2023 / 4:00pm / Packard 202

Abstract

Emerging reinforcement learning (RL) applications necessitate the design of sample-efficient solutions in order to accommodate the explosive growth of problem dimensionality. Despite the empirical success, however, our understanding about the statistical limits of RL remains highly incomplete. In this talk, I will present some recent progress towards settling the sample complexity in three RL scenarios. The first one is concerned with offline RL, which learns using pre-collected data and needs to accommodate distribution shifts and limited data coverage. We prove that model-based offline RL (a.k.a. the plug-in approach) achieves minimal-optimal sample complexity without any burn-in cost. This preliminary success in offline RL further motivates optimal algorithm design in online RL with reward-agnostic exploration, a scenario where the learner is unaware of the reward functions during the exploration stage. The third scenario is multi-agent RL in zero-sum Markov games, assuming access to a simulator. We demonstrate how to overcome the curse of multi-agents and the long-horizon barrier all at once. Our results emphasize the prolific interplay between high-dimensional statistics, online learning, and game theory. (See https://arxiv.org/abs/2204.05275, https://yuxinchen2020.github.io/publications/Reward-free-exploration.pdf, and https://arxiv.org/abs/2208.10458 for more details).

This is based on joint work with Gen Li, Laixi Shi, Yuling Yan, Yuejie Chi, Jianqing Fan, and Yuting Wei.

Bio

Yuxin Chen is currently an associate professor in the Department of Statistics and Data Science at the University of Pennsylvania. Before joining UPenn, he was an assistant professor of electrical and computer engineering at Princeton University. He completed his Ph.D. in Electrical Engineering at Stanford University, and was also a postdoc scholar at Stanford Statistics. His current research interests include high-dimensional statistics, nonconvex optimization, information theory, and reinforcement learning. He has received the Alfred P. Sloan Research Fellowship, the ICCM best paper award (gold medal), the AFOSR and ARO Young Investigator Awards, the Google Research Scholar Award, and was selected as a finalist for the Best Paper Prize for Young Researchers in Continuous Optimization. He has also received the Princeton Graduate Mentoring Award.