Saddle-to-saddle dynamics in diagonal linear networks
Abstract: When training neural networks with gradient methods and small weight initialisation, peculiar learning curves are observed: the training initially shows minimal progress, which is then followed by a sudden transition where a new "feature" is rapidly learned. This pattern is commonly known as incremental learning. In this talk, I will demonstrate that we can comprehensively understand this phenomenon within the context of a simplified network architecture. In this setting, we can establish that the gradient flow trajectory transitions from one saddle point of the training…