Views Navigation

Event Views Navigation

Towards a ‘Chemistry of AI’: Unveiling the Structure of Training Data for more Scalable and Robust Machine Learning

David Alverez-Melis, Harvard University
E18-304

Abstract:  Recent advances in AI have underscored that data, rather than model size, is now the primary bottleneck in large-scale machine learning performance. Yet, despite this shift, systematic methods for dataset curation, augmentation, and optimization remain underdeveloped. In this talk, I will argue for the need for a "Chemistry of AI"—a paradigm that, like the emerging "Physics of AI," embraces a principles-first, rigorous, empiricist approach but shifts the focus from models to data. This perspective treats datasets as structured, dynamic…

Find out more »

How should we do linear regression?

Richard Samworth, University of Cambridge
E18-304

Abstract: In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence.…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764