Views Navigation

Event Views Navigation

Variable selection using presence-only data with applications to biochemistry

Garvesh Raskutti (University of Wisconsin)
E18-304

Abstract:  In a number of problems, we are presented with positive and unlabelled data, referred to as presence-only responses. The application I present today involves studying the relationship between protein sequence and function and presence-only data arises since for many experiments it is impossible to obtain a large set of negative (non-functional) sequences. Furthermore, if the number of variables is large and the goal is variable selection (as in this case), a number of statistical and computational challenges arise due…

Find out more »

User-friendly guarantees for the Langevin Monte Carlo

Arnak Dalalyan (ENSAE-CREST)
E18-304

Abstract: In this talk, I will revisit the recently established theoretical guarantees for the convergence of the Langevin Monte Carlo algorithm of sampling from a smooth and (strongly) log-concave density. I will discuss the existing results when the accuracy of sampling is measured in the Wasserstein distance and provide further insights on relations between, on the one hand, the Langevin Monte Carlo for sampling and, on the other hand, the gradient descent for optimization. I will also present non-asymptotic guarantees for the accuracy…

Find out more »

Optimization’s Implicit Gift to Learning: Understanding Optimization Bias as a Key to Generalization

Nathan Srebro-Bartom (TTI-Chicago)
E18-304

Abstract: It is becoming increasingly clear that implicit regularization afforded by the optimization algorithms play a central role in machine learning, and especially so when using large, deep, neural networks. We have a good understanding of the implicit regularization afforded by stochastic approximation algorithms, such as SGD, and as I will review, we understand and can characterize the implicit bias of different algorithms, and can design algorithms with specific biases. But in this talk I will focus on implicit biases of…

Find out more »

One and two sided composite-composite tests in Gaussian mixture models

Alexandra Carpentier (Otto von Guericke Universitaet)
E18-304

Abstract: Finding an efficient test for a testing problem is often linked to the problem of estimating a given function of the data. When this function is not smooth, it is necessary to approximate it cleverly in order to build good tests. In this talk, we will discuss two specific testing problems in Gaussian mixtures models. In both, the aim is to test the proportion of null means. The aforementioned link between sharp approximation rates of non-smooth objects and minimax testing…

Find out more »

Statistical estimation under group actions: The Sample Complexity of Multi-Reference Alignment

Afonso Bandeira (NYU)
E18-304

Abstract: : Many problems in signal/image processing, and computer vision amount to estimating a signal, image, or tri-dimensional structure/scene from corrupted measurements. A particularly challenging form of measurement corruption are latent transformations of the underlying signal to be recovered. Many such transformations can be described as a group acting on the object to be recovered. Examples include the Simulatenous Localization and Mapping (SLaM) problem in Robotics and Computer Vision, where pictures of a scene are obtained from different positions andorientations;…

Find out more »

When Inference is tractable

David Sontag (MIT)
E18-304

Abstract:  A key capability of artificial intelligence will be the ability to reason about abstract concepts and draw inferences. Where data is limited, probabilistic inference in graphical models provides a powerful framework for performing such reasoning, and can even be used as modules within deep architectures. But, when is probabilistic inference computationally tractable? I will present recent theoretical results that substantially broaden the class of provably tractable models by exploiting model stability (Lang, Sontag, Vijayaraghavan, AI Stats ’18), structure in…

Find out more »

Statistical theory for deep neural networks with ReLU activation function

Johannes Schmidt-Hieber (Leiden)
E18-304

Abstract: The universal approximation theorem states that neural networks are capable of approximating any continuous function up to a small error that depends on the size of the network. The expressive power of a network does, however, not guarantee that deep networks perform well on data. For that, control of the statistical estimation risk is needed. In the talk, we derive statistical theory for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown…

Find out more »

Optimality of Spectral Methods for Ranking, Community Detections and Beyond

Jianqing Fan (Princeton University)
E18-304

Abstract: Spectral methods have been widely used for a large class of challenging problems, ranging from top-K ranking via pairwise comparisons, community detection, factor analysis, among others. Analyses of these spectral methods require super-norm perturbation analysis of top eigenvectors. This allows us to UNIFORMLY approximate elements in eigenvectors by linear functions of the observed random matrix that can be analyzed further. We first establish such an infinity-norm pertubation bound for top eigenvectors and apply the idea to several challenging problems…

Find out more »

Testing degree corrections in Stochastic Block Models

Subhabrata Sen (Microsoft)
E18-304

Abstract: The community detection problem has attracted significant attention in re- cent years, and it has been studied extensively under the framework of a Stochas- tic Block Model (SBM). However, it is well-known that SBMs t real data very poorly, and various extensions have been suggested to replicate characteristics of real data. The recovered community assignments are often sensitive to the model used, and this naturally begs the following question: Given a network with community structure, how to decide whether…

Find out more »

SDSCon 2018: Statistics and Data Science Center Conference

Bartos Theater

Join us at SDSCon 2018 on April 20, 2018 to hear leaders in the field of statistics and data science. SDSCon 2018 is the second annual celebration of MIT’s statistics and data science community organized by MIT’s Statistics and Data Center (SDSC). The mission of SDSC is to advance research activities and academic programs in the “21st Century Statistics” whose foundations include Probability, Statistics, Computation and Data Analysis. The conference will feature presentations from established academic leaders, industry innovators, and…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764