Stochastics and Statistics Seminar Rong Ge - Duke University
A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Net
Abstract: The training of neural networks optimizes complex non-convex objective functions, yet in practice simple algorithms achieve great performances. Recent works suggest that over-parametrization could be a key ingredient in explaining this discrepancy. However, current theories could not fully explain the role of over-parameterization. In particular, they either work in a regime where neurons don't…