Stochastics and Statistics Seminar

Active learning with seed examples and search queries

Speaker Name: Daniel Hsu (Columbia)

Date: April 14, 2017

Time: 11:00am

Location: E18-304

Abstract:

Active learning is a framework for supervised learning that explicitly models, and permits one to control and optimize, the costs of labeling data. The hope is that by carefully selecting which examples to label in an adaptive manner, the number of labels required to learn an accurate classifier is substantially reduced. However, in many learning settings (e.g., when some classes are rare), it is difficult to identify which examples are most informative to label, and existing active learning algorithms are prone to labeling uninformative examples.

Speaker Bio:

I'll describe some improvements to active learning algorithms --- and the active learning framework itself --- that are better at identifying the informative examples.  I'll formalize the common practice of using seed examples and database search in learning, and demonstrate its benefits in active learning.

Based on joint works with Alekh Agarwal, Alina Beygelzimer, Nicholas Herrera, TK Huang, John Langford, Rob Schapire, and Chicheng Zhang.