Beyond UCB: statistical complexity and optimal algorithm for non-linear ridge bandits
Yanjun Han, MIT
E18-304
Abstract: Many existing literature on bandits and reinforcement learning assume a linear reward/value function, but what happens if the reward is non-linear? Two curious phenomena arise for non-linear bandits: first,…