Stochastics and Statistics Seminar
Active learning with seed examples and search queries
Speaker Name: Daniel Hsu (Columbia)
Date: April 14, 2017
Active learning is a framework for supervised learning that explicitly models, and permits one to control and optimize, the costs of labeling data. The hope is that by carefully selecting which examples to label in an adaptive manner, the number of labels required to learn an accurate classifier is substantially reduced. However, in many learning settings (e.g., when some classes are rare), it is difficult to identify which examples are most informative to label, and existing active learning algorithms are prone to labeling uninformative examples.
I'll describe some improvements to active learning algorithms --- and the active learning framework itself --- that are better at identifying the informative examples. I'll formalize the common practice of using seed examples and database search in learning, and demonstrate its benefits in active learning.
Based on joint works with Alekh Agarwal, Alina Beygelzimer, Nicholas Herrera, TK Huang, John Langford, Rob Schapire, and Chicheng Zhang.