BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//MIT Statistics and Data Science Center - ECPv5.14.2.1//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:MIT Statistics and Data Science Center
X-ORIGINAL-URL:https://stat.mit.edu
X-WR-CALDESC:Events for MIT Statistics and Data Science Center
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20200308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20201101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20201204T110000
DTEND;TZID=America/New_York:20201204T120000
DTSTAMP:20220528T121506
CREATED:20200901T175142Z
LAST-MODIFIED:20201124T144957Z
UID:4308-1607079600-1607083200@stat.mit.edu
SUMMARY:A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Net
DESCRIPTION:Abstract: The training of neural networks optimizes complex non-convex objective functions\, yet in practice simple algorithms achieve great performances. Recent works suggest that over-parametrization could be a key ingredient in explaining this discrepancy. However\, current theories could not fully explain the role of over-parameterization. In particular\, they either work in a regime where neurons don’t move much\, or require large number of neurons. In this paper we develop a local convergence theory for mildly over-parameterized two-layer neural net. We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters)\, all student neurons in an over-parametrized two-layer neural network will converge to one of teacher neurons\, and the loss will go to 0. Our result holds for any number of student neurons as long as it’s at least as large as the number of teacher neurons\, and gives explicit bounds on convergence rates that is independent of the number of student neurons. Based on joint work with Mo Zhou and Chi Jin. \n– \nBio: Rong Ge is an assistant professor at Duke University. He received his Ph.D. from Princeton University\, advised by Sanjeev Arora. Before joining Duke\, Rong Ge was a post-doc at Microsoft Research New England. Rong Ge’s research focuses on proving theoretical guarantees for modern machine learning algorithms\, and understanding the optimization for non-convex optimization and in particular neural networks. Rong Ge has received an NSF CAREER award and Sloan Fellowship.
URL:https://stat.mit.edu/calendar/ge/
LOCATION:online
CATEGORIES:Stochastics and Statistics Seminar
END:VEVENT
END:VCALENDAR