Less is more: optimal learning by subsampling and regularization
In this talk, I will discuss the prediction properties of techniques commonly used to scale up kernel methods and Gaussian processes. In particular, I will focus on data dependent and independent sub-sampling methods, namely Nystrom and random features, and study their generalization properties within a statistical learning theory framework. On the one hand I will show that these methods can achieve optimal learning errors while being computational efficient. On the other hand, I will show that subsampling can be seen…