A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Net
December 4 @ 11:00 am - 12:00 pm
Rong Ge - Duke University
Abstract: The training of neural networks optimizes complex non-convex objective functions, yet in practice simple algorithms achieve great performances. Recent works suggest that over-parametrization could be a key ingredient in explaining this discrepancy. However, current theories could not fully explain the role of over-parameterization. In particular, they either work in a regime where neurons don’t move much, or require large number of neurons. In this paper we develop a local convergence theory for mildly over-parameterized two-layer neural net. We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parametrized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0. Our result holds for any number of student neurons as long as it’s at least as large as the number of teacher neurons, and gives explicit bounds on convergence rates that is independent of the number of student neurons. Based on joint work with Mo Zhou and Chi Jin.
Bio: Rong Ge is an assistant professor at Duke University. He received his Ph.D. from Princeton University, advised by Sanjeev Arora. Before joining Duke, Rong Ge was a post-doc at Microsoft Research New England. Rong Ge’s research focuses on proving theoretical guarantees for modern machine learning algorithms, and understanding the optimization for non-convex optimization and in particular neural networks. Rong Ge has received an NSF CAREER award and Sloan Fellowship.