On the statistical cost of score matching

Abstract:
Energy-based models are a recent class of probabilistic generative models wherein the distribution being learned is parametrized up to a constant of proportionality (i.e. a partition function). Fitting such models using maximum likelihood (i.e. finding the parameters which maximize the probability of the observed data) is computationally challenging, as evaluating the partition function involves a high dimensional integral. Thus, newer incarnations of this paradigm instead train other losses which obviate the need to evaluate partition functions. Prominent examples include score matching (in which we fit the score of the data distribution) and noise contrastive estimation (in which we set up a classification problem to distinguish data from noise).

What’s gained with these approaches is tractable gradient-based algorithms. What’s lost is less clear: for example, since maximum likelihood is asymptotically optimal in terms of statistical efficiency, how suboptimal are losses like score matching? We will provide partial answers to this question — and in the process uncover connections between geometric properties of the distribution (Poincaré and isoperimetric constants) and the statistical efficiency of score matching.

Based primarily on https://arxiv.org/abs/2210.00726, https://arxiv.org/abs/2210.00189, https://arxiv.org/abs/2110.11271

Bio:
Andrej Risteski is an Assistant Professor in the Machine Learning Department at Carnegie Mellon University. Prior to joining CMU, he was a Norbert Wiener Research Fellow jointly in the Applied Math department and IDSS at MIT. He received his PhD in Computer Science in Princeton University. His research interests lie in the intersection of machine learning, statistics and theoretical computer science, spanning topics like (probabilistic) generative models, algorithmic tools for learning and inference, representation and self-supervised learning, out-of-distribution generalization and applications of neural approaches to natural language processing and scientific domains. Andrej is the recipient of an Amazon Research Award (“Causal + Deep Out-of-Distribution Learning”) and an NSF CAREER Award (“Theoretical Foundations of Modern Machine Learning Paradigms: Generative and Out-of-Distribution”).

Events