Views Navigation

Event Views Navigation

Measuring Sample Quality with Stein’s Method

Lester Mackey (Stanford)
E62-450

To carry out posterior inference on datasets of unprecedented sizes, practitioners are turning to biased MCMC procedures that trade off asymptotic exactness for computational efficiency. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality…

Find out more »

An Extended Frank-Wolfe Method with Application to Low-Rank Matrix Completion

Rob Freund (MIT Sloan)
32-124

We present an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low-dimensional faces of the feasible region. We present computational guarantees for the method that trade off efficiency in computing near-optimal solutions with upper bounds on the dimension of minimal faces of iterates. We apply our method to the low-rank matrix completion problem, where low-dimensional faces correspond to low-rank solutions. We present computational results for large-scale low-rank matrix completion problems that demonstrate significant speed-ups in…

Find out more »

Making Good Policies with Bad Causal Inference: The Role of Prediction and Machine Learning

In the last few decades, we have learned to be careful about causation, and have developed powerful tools for making causal inferences from data. Applying these tools has generated both policy impact and conceptual insights. Prof. Mullainathan will argue that there are a large class of problems where causal inference is largely unnecessary where, instead, prediction is the central challenge. These problems are ideally suited to machine learning and high dimensional data analysis tools. In this talk he will (1)…

Find out more »

Influence maximization in stochastic and adversarial settings

We consider the problem of influence maximization in fixed networks, for both stochastic and adversarial contagion models. In the stochastic setting, nodes are infected in waves according to linear threshold or independent cascade models. We establish upper and lower bounds for the influence of a subset of nodes in the network, where the influence is defined as the expected number of infected nodes at the conclusion of the epidemic. We quantify the gap between our upper and lower bounds in…

Find out more »

Independent sets, local algorithms and random regular graphs

Mustazee Rahman (MIT Mathematics)
32-124

A independent set in a graph is a set of vertices that have no edges between them. How large can a independent set be in a random d-regular graph? How large can it be if we are to construct it using a (possibly randomized) algorithm that is local in nature? We will discuss a notion of local algorithms for combinatorial optimization problems on large, random d-regular graphs. We will then explain why, for asymptotically large d, local algorithms can only…

Find out more »

Some Fundamental Ideas for Causal Inference on Large Networks

Edo Airoldi (Harvard University)
32-141

Classical approaches to causal inference largely rely on the assumption of “lack of interference”, according to which the outcome of each individual does not depend on the treatment assigned to others. In many applications, however, including healthcare interventions in schools, online education, and design of online auctions and political campaigns on social media, assuming lack of interference is untenable. In this talk, Prof. Airoldi will introduce some fundamental ideas to deal with interference in causal analyses, focusing on situations where…

Find out more »

Fast algorithms and (other) min-max optimal algorithms for mixed regression

Constantine Caramanis (University of Texas at Austin)
32-141

ixture models represent the superposition of statistical processes, and are natural in machine learning and statistics. In mixed regression, the relationship between input and output is given by one of possibly several different (noisy) linear functions. Thus the solution encodes a combinatorial selection problem, and hence computing it is difficult in the worst case. Even in the average case, little is known in the realm of efficient algorithms with strong statistical guarantees. We give general conditions for linear convergence of…

Find out more »

Discovering hidden structures in complex networks

Roman Vershynin (University of Michigan)
32-141

Most big real-world networks (social, technological, biological) are sparse. Most of networks have noticeable structure, which can be formed by clusters (communities) and hubs. When and how can a hidden structure be recovered from a sparse network? Known approaches to this problem come from a variety of disciplines – probability, combinatorics, physics, statistics, optmization, information theory, etc. We will focus on the recently developed probabilistic approaches motivated by sparse recovery, where a network is regarded as a random measurement of…

Find out more »

A Big Data System for Things That Move

The world consists of many interesting things that move: people go to work, home, school, and shop in public transit buses and trains, or in cars and taxis; goods move on these networks and by trucks or by air each day; and food items travel a large distance to meet their eater. Thus, massive movement processes are underway in the world every day and it is critical to ensure their safe, timely and efficient operation. Towards this end, low-cost sensing…

Find out more »

Causal Inference with Random Forests

Stefan Wager (Stanford University)
32-141

Many scientific and engineering challenges---ranging from personalized medicine to customized marketing recommendations---require an understanding of treatment heterogeneity. We develop a non-parametric causal forest for estimating heterogeneous treatment effects that is closely inspired by Breiman's widely used random forest algorithm. Given a potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also propose a practical estimator for the asymptotic variance of causal…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764