Thu, 13-May-2021 / 4:30pm / Zoom: https://stanford.zoom.us/meeting/register/tJckfuCurzkvEtKKOBvDCrPv3McapgP6HygJ
Markov decision processes (MDPs) capture some of the most important aspects of decision making under uncertainty and as such they are at the heart of many efforts to decision making under uncertainty. However, MDPs are “flat” with no structure and as such, planning and learning in MDPs with multidimensional state spaces, common in applications, is provably intractable. Yet, reinforcement learning methods have been quite successful in providing strong solutions to some of these seemingly intractable problems. In this talk I will present my view of how to think about these successes by presenting a framework where the key idea is to give algorithms hints that can create backdoors to crack otherwise intractable problems. The talk will then dive into categorizing hints based on whether they can indeed succeed at doing this for the special case when the hints are given in the form of constraints on how value functions look like in the context of planning with generative models, also known as simulation optimization. As we shall see, seemingly minor differences between hints can cause some hints to work, while others fail.
Csaba Szepesvari is a Canada CIFAR AI Chair, the team-lead for the “Foundations” team at DeepMind and a Professor of Computing Science at the University of Alberta. He earned his PhD in 1999 from Jozsef Attila University, in Szeged, Hungary. In addition to regularly publishing at top tier journals and conferences, he has (co-)authored three books. Currently, he serves as the action editor of the Journal of Machine Learning Research and as an associate editor of the Mathematics of Operations Research journal, in addition to serving regularly on program committees of various machine learning and AI conferences. Dr. Szepesvari’s main interest is developing principled, learning-based approaches to artificial intelligence (AI). He is the co-inventor of UCT, an influential Monte-Carlo tree search algorithm, a variant of which was used in the AlphaGo program which, in a landmark game, defeated the top Go professional Lee Sedol in 2016, ten years after the invention of UCT. In 2020, Dr. Szepesvari co-founded the weekly “Reinforcement Learning Theory virtual seminar series”, which showcases top theoretical work in the area of reinforcement learning with speakers and which is open to attendees from all over the world.