Evaluating a black-box algorithm: stability, risk, and model comparisons
Abstract: When we run a complex algorithm on real data, it is standard to use a holdout set, or a cross-validation strategy, to evaluate its behavior and performance. When we do so, are we learning information about the algorithm itself, or only about the particular fitted model(s) that this particular data set produced? In this talk, we will establish fundamental hardness results on the problem of empirically evaluating properties of a black-box algorithm, such as its stability and its average…