Separating Estimation from Decision Making in Contextual Bandits

Name: Separating Estimation from Decision Making in Contextual Bandits
Start: 2020-09-25T11:00:00-04:00
End: 2020-09-25T12:00:00-04:00
Location: online

September 25, 2020 @ 11:00 am - 12:00 pm

Dylan Foster, MIT

online

Event Navigation

« Webinar: Inside the MITx MicroMasters Program in Statistics and Data Science
Bayesian inverse problems, Gaussian processes, and partial differential equations »

Abstract: The contextual bandit is a sequential decision making problem in which a learner repeatedly selects an action (e.g., a news article to display) in response to a context (e.g., a user’s profile) and receives a reward, but only for the action they selected. Beyond the classic explore-exploit tradeoff, a fundamental challenge in contextual bandits is to develop algorithms that can leverage flexible function approximation to model similarity between contexts, yet have computational requirements comparable to classical supervised learning tasks such as classification and regression. To this end, we provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. Conceptually, our results show that it is possible to provably separate estimation and decision making into separate algorithmic building blocks, and that this can be effective both in theory and in practice. Time permitting, I will discuss extensions of these techniques to more challenging reinforcement learning problems.
–
Bio: Dylan Foster is a postdoctoral researcher at the MIT Institute for Foundations of Data Science. In 2019 he received his PhD in computer science at Cornell University, advised by Karthik Sridharan. His research focuses on theory for machine learning in real-world settings. He is especially interested in online machine learning problems such as contextual bandits and reinforcement learning. Dylan previously received his BS and MS in Electrical Engineering from USC in 2014. He has received awards including the best paper award at COLT (2019), best student paper award at COLT (2018, 2019), Facebook PhD fellowship, and NDSEG PhD fellowship.

MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764

Accessibility

Events

Separating Estimation from Decision Making in Contextual Bandits

September 25, 2020 @ 11:00 am - 12:00 pm

Event Navigation