Variational methods in reinforcement learning
Abstract: Reinforcement learning is the study of models and procedures for optimal sequential decision-making under uncertainty. At its heart lies the Bellman optimality operator, whose unique fixed point specifies an optimal policy and value function. In this talk, we discuss two classes of variational methods that can be used to obtain approximate solutions with accompanying error guarantees. For policy evaluation problems based on on-line data, we present Krylov-Bellman boosting, which combines ideas from Krylov methods with non-parametric boosting. For policy optimization problems based on…