- This event has passed.
Neural networks: optimization, transition to linearity and deviations therefrom
April 4, 2022 @ 4:00 pm - 5:00 pm
Mikhail Belkin (UC San Diego)
Abstract: The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. I will first discuss some general mathematical principles allowing for efficient optimization in over-parameterized non-linear systems, a setting that includes deep neural networks. I will argue that optimization problems corresponding to these systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition on most of the parameter space, allowing for efficient optimization by gradient descent or SGD.
As a separate but related development, I will talk about the remarkable recently discovered phenomenon of transition to linearity (constancy of NTK), when networks become linear functions of their parameters as their width increases. In particular I will talk about a quite general form of the transition to linearity for a broad class of feed-forward networks corresponding to arbitrary directed graphs. It turns out that the width of such networks is characterized by the minimum in-degree of their graphs, excluding the input layer and the first layer.
Finally, I will mention a very interesting deviation from linearity, a so-called “catapult phase”, a recently identified non-linear and, furthermore, non-perturbatative phenomenon, which persists even as neural networks become increasingly linear in the limit of the increasing width.
Based on joint work with Chaoyue Liu, Libin Zhu, Adit Radhakrishnan
About the speaker: Mikhail Belkin received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral analysis to data science. His recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in deep learning. This empirical evidence necessitated revisiting some of the basic concepts in statistics and optimization. One of his key recent findings is the “double descent” risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation.
Mikhail Belkin is a recipient of a NSF Career Award and a number of best paper and other awards. He has served on the editorial boards of the Journal of Machine Learning Research, IEEE Pattern Analysis and Machine Intelligence and SIAM Journal on Mathematics of Data Science.