# Past Events

## Past Events

### Events List Navigation

## Robust Statistics, Revisited

Starting from the seminal works of Tukey (1960) and Huber (1964), the field of robust statistics asks: Are there estimators that provable work in the presence of noise? The trouble is that all known provably robust estimators are also hard to compute in high-dimensions. Here, we study a basic problem in robust statistics, posed in various forms in the above works. Given corrupted samples from a high-dimensional Gaussian, are there efficient algorithms to accurately estimate its parameters? We give the…

Find out more »## Probabilistic factorizations of big tables and networks

It is common to collect high-dimensional data that are structured as a multiway array or tensor; examples include multivariate categorical data that are organized as a contingency table, sequential data on nucleotides or animal vocalizations, and neuroscience data on brain networks. In each of these cases, there is interest in doing inference on the joint probability distribution of the data and on interpretable functionals of this probability distribution. The goal is to avoid restrictive parametric assumptions, enable both statistical and…

Find out more »## Jagers-Nerman stable age distribution theory, change point detection and power of two choices in evolving networks

Abstract: The last few years have seen an explosion in the amount of data on real world networks, including networks that evolve over time. A number of mathematical models have been proposed to understand the evolution of such networks and explain the emergence of a wide array of structural features such as heavy tailed degree distribution and small world connectivity of real networks. One sophisticated mathematical tool in the arsenal of a modern probabilist is the so-called Jagers and Nerman…

Find out more »## Sample-optimal inference, computational thresholds, and the methods of moments

We propose an efficient meta-algorithm for Bayesian inference problems based on low-degree polynomials, semidefinite programming, and tensor decomposition. The algorithm is inspired by recent lower bound constructions for sum-of-squares and related to the method of moments. Our focus is on sample complexity bounds that are as tight as possible (up to additive lower-order terms) and often achieve statistical thresholds or conjectured computational thresholds. Our algorithm recovers the best known bounds for partial recovery in the stochastic block model, a widely-studied…

Find out more »## Brain and Cognitive Science & IDSS Special Seminar: Sasha Rakhlin

## Active learning with seed examples and search queries

Active learning is a framework for supervised learning that explicitly models, and permits one to control and optimize, the costs of labeling data. The hope is that by carefully selecting which examples to label in an adaptive manner, the number of labels required to learn an accurate classifier is substantially reduced. However, in many learning settings (e.g., when some classes are rare), it is difficult to identify which examples are most informative to label, and existing active learning algorithms are prone to labeling uninformative examples.

Find out more »## SDSCon 2017 – Statistics and Data Science Center Conference

As part of the MIT Institute for Data, Systems, and Society (IDSS), the Statistics and Data Science Center (SDSC) is a MIT-wide focal point for advancing academic programs and research activities in statistics and data science. SDSC Day will be a celebration and community-building event for those interested in statistics. Discussions will cover applications of statistics and data science across a wide range of fields and approaches.

Find out more »## Testing properties of distributions over big domains

We describe an emerging research direction regarding the complexity of testing global properties of discrete distributions, when given access to only a few samples from the distribution. Such properties might include testing if two distributions have small statistical distance, testing various independence properties, testing whether a distribution has a specific shape (such as monotone decreasing, k-modal, k-histogram, monotone hazard rate,...), and approximating the entropy. We describe bounds for such testing problems whose sample complexities are sublinear in the size of the…

Find out more »## Some related phase transitions in phylogenetics and social network analysis

Spin systems on trees have found applications ranging from the reconstruction of phylogenies to the analysis of networks with community structure. A key feature of such processes is the interplay between the growth of the tree and the decay of correlations along it. How the resulting threshold phenomena impact estimation depends on the problem considered. I will illustrate this on two recent results: 1) the critical threshold of ancestral sequence reconstruction by maximum parsimony on general phylogenies and 2) the…

Find out more »## Invariance and Causality

Why are we interested in the causal structure of a process? In classical prediction tasks, for example, it seems that no causal knowledge is required. In many situations, however, we are interested in a system's behavior after parts of this system have been changed. Here, causal models become important because they are usually considered invariant under those changes. A causal prediction (which uses only direct causes of the target variable as predictors) remains valid even if we intervene on predictor…

Find out more »