ISL Colloquium

← List all talks ...

What is the Statistical Complexity of Reinforcement Learning?

Sham Kakade – Professor, Harvard

Thu, 14-Apr-2022 / 1:00pm / Zoom: https://stanford.zoom.us/meeting/register/tJckfuCurzkvEtKKOBvDCrPv3McapgP6HygJ

Talk

Abstract

A fundamental question in the theory of reinforcement learning is what (representational or structural) conditions govern our ability to generalize and avoid the curse of dimensionality. With regards to supervised learning, these questions are well understood theoretically: practically, we have overwhelming evidence on the value of representational learning (say through modern deep networks) as a means for sample efficient learning, and, theoretically, there are well-known complexity measures (e.g. the VC dimension and Rademacher complexity) that govern the statistical complexity of learning. Providing an analogous theory for reinforcement learning is far more challenging, where even characterizing any structural conditions which support sample efficient generalization is far less well understood.

This talk will highlight recent advances towards characterizing when generalization is possible in reinforcement learning (both in online and offline settings), focusing on both necessary and sufficient conditions. In particular, we will introduce a new complexity measure, the Decision-Estimation Coefficient, that is proven to be necessary (and, essentially, sufficient) for sample-efficient interactive learning.

Bio

Sham Kakade is a Gordon McKay Professor of Computer Science & Statistics at Harvard University. He works on the mathematical foundations of machine learning and AI. Sham’s thesis helped in laying the statistical foundations of reinforcement learning. With his collaborators, his additional contributions include: one of the first provably efficient policy search methods, Conservative Policy Iteration, for reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models; the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms. He is the recipient of the ICML Test of Time Award (2020), the IBM Pat Goldberg best paper award (in 2007), INFORMS Revenue Management and Pricing Prize (2014). He has been program chair for COLT 2011.

Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He then completed his Ph.D. in computational neuroscience at the Gatsby Unit at University College London, under the supervision of Peter Dayan. He was a postdoc at the Dept. of Computer Science, University of Pennsylvania, where he broadened his studies to include computational game theory and economics from the guidance of Michael Kearns. Sham has been a Principal Research Scientist at Microsoft Research, New England, an associate professor at the Department of Statistics, Wharton, UPenn, and an assistant professor at the Toyota Technological Institute at Chicago.