Do Large Language Models (Really) Need Statistical Foundations?

Abstract:
In this talk, we advocate for developing statistical foundations for large language models (LLMs). We begin by examining two key characteristics that necessitate statistical perspectives for LLMs: (1) the probabilistic, autoregressive nature of next-token prediction, and (2) the inherent complexity and black box nature of Transformer architectures. To demonstrate how statistical insights can advance LLM development and applications, we present two examples. First, we demonstrate statistical inconsistencies and biases arising from the current approach to aligning LLMs with human preference. We propose a regularization term for aligning LLMs that is both necessary and sufficient to ensure consistent alignment. Second, we introduce a novel statistical framework for analyzing the efficacy of watermarking schemes, with a focus on a watermarking scheme developed by OpenAI for which we derive optimal detection rules that outperform existing ones. Time permitting, we will explore how statistical principles can inform rigorous evaluation for LLMs. Collectively, these findings demonstrate how statistical insights can effectively address several pressing challenges emerging from LLMs.

Biography:
Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning (PRiML) Center. Prior to joining Penn, he received his Ph.D. in statistics from Stanford University in 2016 and bachelor’s degree in mathematics from Peking University in 2011. His research interests span statistical foundations of generative AI, privacy-preserving machine learning, high-dimensional statistics, and optimization. He serves as an associate editor of Journal of the American Statistical Association, Journal of Machine Learning Research, Annals of Applied Statistics, Harvard Data Science Review, Foundations and Trends in Statistics, Operations Research, and Journal of the Operations Research Society of China, and he is currently guest editing a special issue on Statistics for Large Language Models and Large Language Models for Statistics in Stat. His work has been recognized with several awards, such as the Stanford Anderson Dissertation Award, NSF CAREER Award, Sloan Research Fellowship, IMS Peter Hall Prize, SIAM Early Career Prize in Data Science, ASA Noether Early Career Award, ICBS Frontiers of Science Award in Mathematics, IMS Medallion Lectureship, and Outstanding Young Talent Award in the 2025 China Annual Review of Mathematics. He is a Fellow of the IMS.

Events

Do Large Language Models (Really) Need Statistical Foundations?