Views Navigation

Event Views Navigation

Random polytopes and estimation of convex bodies

Victor-Emmanuel Brunel (Yale)
E17-133

In this talk we discuss properties of random polytopes. In particular, we study the convex hull of i.i.d. random points, whose law is supported on a convex body. We propose deviation and moment inequalities for this random polytope, and then discuss its optimality, when it is seen as an estimator of the support of the probability measure, which may be unknown. We also define a notion of multidimensional quantile sets for probability measures in a Euclidean space. These are convex…

Find out more »

The exact k-SAT threshold for large k

Nike Sun (MSR New England and MIT Mathematics)
E62-450

We establish the random k-SAT threshold conjecture for all k exceeding an absolute constant k0. That is, there is a single critical value α∗(k) such that a random k-SAT formula at clause-to-variable ratio α is with high probability satisfiable for αα∗(k). The threshold α∗(k) matches the explicit prediction derived by statistical physicists on the basis of the one-step replica symmetry breaking (1RSB) heuristic. In the talk I will describe the main obstacles in computing the threshold, and explain how they…

Find out more »

How good is your model? Guilt-free interactive data analysis.

Moritz Hardt (IBM Almaden)
E62-450

Reliable tools for model selection and validation are indispensable in almost all applications of machine learning and statistics. Decades of theory support a widely used set of techniques, such as holdout sets, bootstrapping and cross validation methods. Yet, much of the theory breaks down in the now common situation where the data analyst works interactively with the data, iteratively choosing which methods to use by probing the same data many times. A good example are data science competitions in which…

Find out more »

From Bandits to Ethical Clinical Trials. Optimal Sample Size for Multi-Phases Problems.

Vianney Perchet (Université Paris Diderot)
E62-450

In the first part of this talk, I will present recent results on the problem of sequential allocations called “multi-armed bandit”. Given several i.i.d. processes, the objective is to sample them sequentially (and thus get a sequence of random rewards) in order to maximize the expected cumulative reward. This framework simultaneously encompasses issues of estimation and optimization (the so-called “exploration vs exploitation” dilemma). A recent example of applications is the ad placement on web sites. In the second part, I…

Find out more »

Tensor Prediction, Rademacher Complexity and Random 3-XOR

Ankur Moitra (MIT CSAIL)
E62-450

Here we study the tensor prediction problem, where the goal is to accurately predict the entries of a low rank, third-order tensor (with noise) given as few observations as possible. We give algorithms based on the sixth level of the sum-of-squares hierarchy that work with roughly m=n3/2 observations, and we complement our result by showing that any attempt to solve tensor prediction with fewer observations through the sum-of-squares hierarchy would run in moderately exponential time. In contrast, information theoretically roughly…

Find out more »

Nonparametric Graph Estimation

Han Liu (Princeton)
E62-450

Undirected graphical model has proven to be a useful abstraction in statistics and machine learning. The starting point is the graph of a distribution. While often the graph is assumed given, we are interested in estimating the graph from data. In this talk we present new nonparametric and semiparametric methods for graph estimation. The performance of these methods is illustrated and compared on several real and simulated examples.

Find out more »

Measuring Sample Quality with Stein’s Method

Lester Mackey (Stanford)
E62-450

To carry out posterior inference on datasets of unprecedented sizes, practitioners are turning to biased MCMC procedures that trade off asymptotic exactness for computational efficiency. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality…

Find out more »

An Extended Frank-Wolfe Method with Application to Low-Rank Matrix Completion

Rob Freund (MIT Sloan)
32-124

We present an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low-dimensional faces of the feasible region. We present computational guarantees for the method that trade off efficiency in computing near-optimal solutions with upper bounds on the dimension of minimal faces of iterates. We apply our method to the low-rank matrix completion problem, where low-dimensional faces correspond to low-rank solutions. We present computational results for large-scale low-rank matrix completion problems that demonstrate significant speed-ups in…

Find out more »

Making Good Policies with Bad Causal Inference: The Role of Prediction and Machine Learning

In the last few decades, we have learned to be careful about causation, and have developed powerful tools for making causal inferences from data. Applying these tools has generated both policy impact and conceptual insights. Prof. Mullainathan will argue that there are a large class of problems where causal inference is largely unnecessary where, instead, prediction is the central challenge. These problems are ideally suited to machine learning and high dimensional data analysis tools. In this talk he will (1)…

Find out more »

Influence maximization in stochastic and adversarial settings

We consider the problem of influence maximization in fixed networks, for both stochastic and adversarial contagion models. In the stochastic setting, nodes are infected in waves according to linear threshold or independent cascade models. We establish upper and lower bounds for the influence of a subset of nodes in the network, where the influence is defined as the expected number of infected nodes at the conclusion of the epidemic. We quantify the gap between our upper and lower bounds in…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764