How should we do linear regression?
April 25 @ 11:00 am - 12:00 pm
Richard Samworth, University of Cambridge
E18-304
Event Navigation
Abstract: In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency.
Bio: Richard Samworth obtained his PhD in Statistics from the University of Cambridge in 2004, and has remained in Cambridge since, becoming a full professor in 2013 and the Professor of Statistical Science in 2017. His main research interests are in high-dimensional and nonparametric statistics; he has developed methods and theory for shape-constrained inference, missing data, subgroup selection, data perturbation techniques (subsampling, the bootstrap, random projections, knockoffs), changepoint estimation and independence testing, amongst others. Richard currently holds a European Research Council Advanced Grant. He received the COPSS Presidents’ Award in 2018, was elected a Fellow of the Royal Society in 2021 and served as co-editor of the Annals of Statistics (2019-2021).