Adversarial combinatorial bandits for imperfect-information sequential games
Abstract: This talk will focus on learning policies for tree-form decision problems (extensive-form games) from adversarial feedback. In principle, one could convert learning in any extensive-form game (EFG) into learning in an equivalent normal-form game (NFG), that is, a multi-armed bandit problem with one arm per tree-form policy. However, doing so comes at the cost of an exponential blowup of the strategy space. So, progress on NFGs and EFGs has historically followed separate tracks, with the EFG community often having…