Stochastics and Statistics Seminar Series

Variational methods in reinforcement learning

March 24, 2023 @ 11:00 am - 12:00 pm

Martin Wainwright, MIT


Abstract: Reinforcement learning is the study of models and procedures for optimal sequential decision-making under uncertainty.  At its heart lies the Bellman optimality operator, whose unique fixed point specifies an optimal policy and value function.  In this talk, we discuss two classes of variational methods that can be used to obtain approximate solutions with accompanying error guarantees.  For policy evaluation problems based on on-line data, we present Krylov-Bellman boosting, which combines ideas from Krylov methods with non-parametric boosting.  For policy optimization problems based on historical data, we describe a variational formulation that combines Galerkin approximation and mirror descent.  We discuss an oracle inequality that trades off optimally between the value and statistical uncertainty.

Based on joint work with Eric Xia (MIT), Andrea Zanette (Berkeley) and Emma Brunskill (Stanford).

Bio: Martin Wainwright is the Cecil H. Green Professor of EECS as well as a Principal Investigator in LIDS. He was previously the Howard Friesen Chair with a joint appointment between the Department of Statistics and the Department of EECS at University of California at Berkeley. He has made seminal contributions in high-dimensional statistics, control and optimization, and statistical machine learning.

Among other awards, (including the George M. Sprowls Prize for the thesis he developed at LIDS), he has received the COPSS Presidents’ Award (2014) from the Joint Statistical Societies, the David Blackwell Lectureship (2017), and Medallion Lectureship (2013) from the Institute of Mathematical Statistics, and Best Paper awards from the IEEE Signal Processing Society and IEEE Information Theory Society. He was a section lecturer at the International Congress of Mathematicians in 2014.

