Views Navigation

Event Views Navigation

Size-Independent Sample Complexity of Neural Networks

Ohad Shamir (Weizman Institute)
E18-304

Abstract: I'll describe new bounds on the sample complexity of deep neural networks, based on the norms of the parameter matrices at each layer. In particular, we show how certain norms lead to the first explicit bounds which are fully independent of the network size (both depth and width), and are therefore applicable to arbitrarily large neural networks. These results are derived using some novel techniques, which may be of independent interest. Joint work with Noah Golowich (Harvard) and Alexander…

Find out more »

Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions

Adel Javanmard (USC)
E18-304

Abstract: Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers’ valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers’ valuations, i.e., buyers’ preferences. The seller’s goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is…

Find out more »

Fitting a putative manifold to noisy data

Hariharan Narayanan (Tata Institute of Fundamental Research, Mumbai)
E18-304

Abstract: We give a solution to the following question from manifold learning. Suppose data belonging to a high dimensional Euclidean space is drawn independently, identically distributed from a measure supported on a low dimensional twice differentiable embedded compact manifold M, and is corrupted by a small amount of i.i.d gaussian noise. How can we produce a manifold M whose Hausdorff distance to M is small and whose reach (normal injectivity radius) is not much smaller than the reach of M? This…

Find out more »

Resource-efficient ML in 2 KB RAM for the Internet of Things

Prateek Jain (Microsoft Research)
E18-304

Abstract: We propose an alternative paradigm for the Internet of Things (IoT) where machine learning algorithms run locally on severely resource-constrained edge and endpoint devices without necessarily needing cloud connectivity. This enables many scenarios beyond the pale of the traditional paradigm including low-latency brain implants, precision agriculture on disconnected farms, privacy-preserving smart spectacles, etc. Towards this end, we develop novel tree and kNN based algorithm, called Bonsai and ProtoNN, for efficient prediction on IoT devices -- such as those based…

Find out more »

Variational problems on random structures and their continuum limits

Dejan Slepčev (Carnegie Mellon University)
E18-304

Abstract: We will discuss variational problems arising in machine learning and their limits as the number of data points goes to infinity. Consider point clouds obtained as random samples of an underlying "ground-truth" measure. Graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points. Many machine learning tasks, such as clustering and semi-supervised learning, can be posed as minimizing  functionals on such graphs. We consider functionals involving graph cuts, graph laplacians…

Find out more »

An Information-Geometric View of Learning in High Dimensions

Gregory Wornell (MIT)
32-155

Abstract: We consider the problem of data feature selection prior to inference task specification, which is central to high-dimensional learning. Introducing natural notions of universality for such problems, we show a local equivalence among them. Our analysis is naturally expressed via information geometry, and represents a conceptually and practically useful learning methodology. The development reveals the key roles of the singular value decomposition, Hirschfeld-Gebelein-Renyi maximal correlation, canonical correlation and principle component analyses, Tishby's information bottleneck, Wyner's common information, Ky Fan…

Find out more »

Strong data processing inequalities and information percolation

Yury Polyanskiy (MIT)
32-D677

Title: Strong data processing inequalities and information percolation Abstract:  The data-processing inequality, that is, $I(U;Y) \le I(U;X)$ for a Markov chain $U \to X \to Y$, has been the method of choice for proving impossibility (converse) results in information theory and many other disciplines. A channel-dependent improvement is called the strong data-processing inequality (or SDPI). In this talk we will: a) review SDPIs; b) show how point-to-point SDPIs can be combined into an SDPI for a network; c) show recent…

Find out more »

Graphical models under total positivity

Caroline Uhler (MIT)
32-D677

Title: Graphical models under total positivity Abstract: We discuss properties of distributions that are multivariate totally positive of order two (MTP2). Such distributions appear in the context of positive dependence, ferromagnetism in the Ising model, and various latent models. While such distributions have a long history in probability theory and statistical physics, in this talk I will discuss such distributions in the context of high dimensional statistics and graphical models. In particular, I will show that MTP2 in the Gaussian…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764