Stochastics and Statistics Seminar Series Martin Wainwright, MIT
Variational methods in reinforcement learning
Abstract: Reinforcement learning is the study of models and procedures for optimal sequential decision-making under uncertainty. At its heart lies the Bellman optimality operator, whose unique fixed point specifies an optimal policy and value function. In this talk, we discuss two classes of variational methods that can be used to obtain approximate solutions with accompanying error guarantees. For…