Emergent outlier subspaces in high-dimensional stochastic gradient descent
Abstract: It has been empirically observed that the spectrum of neural network Hessians after training have a bulk concentrated near zero, and a few outlier eigenvalues. Moreover, the eigenspaces associated to these outliers have been associated to a low-dimensional subspace in which most of the training occurs, and this implicit low-dimensional structure has been used as a heuristic for the success of high-dimensional classification. We will describe recent rigorous results in this direction for the Hessian spectrum over the course…