A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Net
Abstract: The training of neural networks optimizes complex non-convex objective functions, yet in practice simple algorithms achieve great performances. Recent works suggest that over-parametrization could be a key ingredient in explaining this discrepancy. However, current theories could not fully explain the role of over-parameterization. In particular, they either work in a regime where neurons don't move much, or require large number of neurons. In this paper we develop a local convergence theory for mildly over-parameterized two-layer neural net. We show…