ISL Colloquium

← List all talks ...

Near-Optimal Algorithms for Imitation Learning

Jiantao Jiao – Assistant Professor, UC Berkeley

Thu, 13-Apr-2023 / 2:00pm / Packard 202

Note the non-standard time - Thu, 2:00pm

Abstract

We study the fundamental limits and efficient algorithms for imitation learning in Markov decision processes (MDP). It is known that Behavior Cloning (BC) achieves suboptimality growing quadratically in the horizon in imitation learning, which is termed error compounding in the literature. However, the only existing lower bound for BC is about optimization error, and it is not clear whether error compounding is a fundamental phenomenon related to sample inefficiency. We show that when the MDP transition function is unknown, all algorithms have to suffer a suboptimality that grows quadratically with the horizon even if the optimization error is zero and the algorithm can interactively query the expert such as in the setting of DAGGER. This negative result suggests that overcoming the error compounding requires imposing more assumptions in imitation learning. We show two approaches to mitigate the quadratic error compounding: active query with recoverable mistakes and known transition. First, we establish that if the expert knows how to recover from a temporary mistake and continue on the optimal path, then a carefully designed no-regret learning algorithm that actively queries the expert can provably beat the quadratic error compounding and achieve linear suboptimality growth in the horizon. Second, we transform the intuition that we should always utilize transition function knowledge to only visit states the expert has visited into a precise optimization problem named Mimic-MD, and introduce a novel technique named artificial trajectory synthesis to provably break the quadratic dependence on the horizon and improve the exponent to 32. Last, we show that under the additional assumption that the expert is optimal for the true reward function in the known transition setting, one can in fact further improve the exponent from 32 to 1 via a new algorithm named Mimic-Mixture in some special cases of MDP. We discuss the implementation details about Mimic-MD and Mimic-Mixture, and another conjectured algorithm utilizing inverse reinforcement learning which we believe achieves the optimal performance for very general settings.

Bio

Jiantao Jiao is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences and the Department of Statistics at the University of California, Berkeley. He co-directs the Center for the Theoretical Foundations of Learning, Inference, Information, Intelligence, Mathematics, and Microeconomics at Berkeley (CLIMB), is a member of the Berkeley Artificial Intelligence Research (BAIR) Lab, and the Berkeley Laboratory for Information and System Sciences (BLISS). He received his Ph.D. from Stanford University in 2018. He is a recipient of the Presidential Award of Tsinghua University and the Stanford Graduate Fellowship. He was a semi-plenary speaker at ISIT 2015 and a co-recipient of the MobiHoc 2019 best paper award. His research interests are in statistical machine learning, high-dimensional and nonparametric statistics, mathematical programming, information theory, and their applications in robotics, natural language processing, computer vision, and distributed systems.