Stochastics and Statistics Seminar

Views Navigation

Event Views Navigation

Model-agnostic covariate-assisted inference on partially identified causal effects

Lihua Lei, Stanford University
E18-304

Abstract: Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach…

Find out more »

Large cycles for the interchange process

Allan Sly, Princeton University
E18-304

Abstract: The interchange process $\sigma_T$ is a random permutation valued stochastic process on a graph evolving in time by transpositions on its edges at rate 1. On $Z^d$, when $T$ is small all the cycles of the permutation $\sigma_T$ are finite almost surely but it is conjectured that infinite cycles appear in dimensions 3 and higher for large times.  In this talk I will focus on the finite volume case where we establish that macroscopic cycles with Poisson-Dirichlet statistics appear for large times in…

Find out more »

Trees and V’s: Inference for Ensemble Models

Giles Hooker, Wharton School - UPenn
E18-304

Abstract: This talk discusses uncertainty quantification and inference using ensemble methods. Recent theoretical developments inspired by random forests have cast bagging-type methods as U-statistics when bootstrap samples are replaced by subsamples, resulting in a central limit theorem and hence the potential for inference. However, to carry this out requires estimating a variance for which all proposed estimators exhibit substantial upward bias. In this talk, we convert subsamples without replacement to subsamples with replacement resulting in V-statistics for which we prove…

Find out more »

Central Limit Theorems for Smooth Optimal Transport Maps

Tudor Manole, MIT
E18-304

Abstract: One of the central objects in the theory of optimal transport is the Brenier map: the unique monotone transformation which pushes forward an absolutely continuous probability law onto any other given law. Recent work has identified a class of plugin estimators of Brenier maps which achieve the minimax L^2 risk, and are simple to compute. In this talk, we show that such estimators obey pointwise central limit theorems. This provides a first step toward the question of performing statistical…

Find out more »

Sampling through optimization of divergences on the space of measures

Anna Korba, ENSAE/CREST
E18-304

Abstract: Sampling from a target measure when only partial information is available (e.g. unnormalized density as in Bayesian inference, or true samples as in generative modeling) is a fundamental problem in computational statistics and machine learning. The sampling problem can be cast as an optimization one over the space of probability distributions of a well-chosen discrepancy,  e.g. a divergence or distance to the target. In this talk, I will discuss several properties of sampling algorithms for some choices of discrepancies (standard ones,…

Find out more »

A Flexible Defense Against the Winner’s Curse

Tijana Zrnic, Stanford University
E18-304

Abstract: Across science and policy, decision-makers often need to draw conclusions about the best candidate among competing alternatives. For instance, researchers may seek to infer the effectiveness of the most successful treatment or determine which demographic group benefits most from a specific treatment. Similarly, in machine learning, practitioners are often interested in the population performance of the model that empirically performs best. However, cherry-picking the best candidate leads to the winner’s curse: the observed performance for the winner is biased…

Find out more »

The Conflict Graph Design: Estimating Causal Effects Under Interference

Christopher Harshaw, Columbia University
E18-304

Abstract: From clinical trials to corporate strategy, randomized experiments are a reliable methodological tool for estimating causal effects. In recent years, there has been a growing interest in causal inference under interference, where treatment given to one unit can affect outcomes of other units. While the literature on interference has focused primarily on unbiased and consistent estimation, designing randomized network experiments to insure tight rates of convergence is relatively under-explored. Not only are the optimal rates of estimation for different…

Find out more »

Scaling Limits of Neural Networks

Boris Hanin, Princeton University
E18-304

Abstract: Neural networks are often studied analytically through scaling limits: regimes in which taking to infinity structural network parameters such as depth, width, and number of training datapoints results in simplified models of learning. I will survey several such approaches with the goal of illustrating the rich and still not fully understood space of possible behaviors when some or all of the network’s structural parameters are large. Bio: Boris Hanin is an Assistant Professor at Princeton Operations Research and Financial…

Find out more »

Evaluating a black-box algorithm: stability, risk, and model comparisons

Rina Foygel Barber, University of Chicago
E18-304

Abstract: When we run a complex algorithm on real data, it is standard to use a holdout set, or a cross-validation strategy, to evaluate its behavior and performance. When we do so, are we learning information about the algorithm itself, or only about the particular fitted model(s) that this particular data set produced? In this talk, we will establish fundamental hardness results on the problem of empirically evaluating properties of a black-box algorithm, such as its stability and its average…

Find out more »

Statistical Inference with Limited Memory

Ofer Shayevitz, Tel Aviv University
E18-304

Abstract:  In statistical inference problems, we are typically given a limited number of samples from some underlying distribution, and we wish to estimate some property of that distribution, under a given measure of risk. We are usually interested in characterizing and achieving the best possible risk as a function of the number of available samples. Thus, it is often implicitly assumed that samples are co-located, and that communication bandwidth as well as computational power are not a bottleneck, essentially making the number…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764