Statistics and Data Science Seminar


list_alt
  • Scaling Limits of Neural Networks

    On November 8, 2024 at 11:00 am till 12:00 pm
    Boris Hanin, Princeton University
    E18-304

    Abstract: Neural networks are often studied analytically through scaling limits: regimes in which taking to infinity structural network parameters such as depth, width, and number of training datapoints results in simplified models of learning. I will survey several such approaches with the goal of illustrating the rich and still not fully understood space of possible behaviors when some or all of the network’s structural parameters are large.

    Bio:

    Boris Hanin is an Assistant Professor at Princeton Operations Research and Financial Engineering working on deep learning, probability, and spectral asymptotics. Prior to Princeton, he was an Assistant Professor in Mathematics at Texas A&M and an NSF Postdoc at MIT Math.  He is also an advisor and member of the technical staff at Foundry, an AI and computing startup.

    Find out more »: Scaling Limits of Neural Networks
  • The Conflict Graph Design: Estimating Causal Effects Under Interference

    On November 1, 2024 at 11:00 am till 12:00 pm
    Christopher Harshaw, Columbia University
    E18-304

    Abstract:
    From clinical trials to corporate strategy, randomized experiments are a reliable methodological tool for estimating causal effects. In recent years, there has been a growing interest in causal inference under interference, where treatment given to one unit can affect outcomes of other units. While the literature on interference has focused primarily on unbiased and consistent estimation, designing randomized network experiments to insure tight rates of convergence is relatively under-explored. Not only are the optimal rates of estimation for different causal effects under interference an open question but previously proposed designs are created in an ad-hoc fashion.

    In this talk, we present the Conflict Graph Design, a new approach for constructing experimental designs to estimate causal effects under interference. Given a particular causal estimand (e.g. total treatment effect, direct effect, spill-over effect etc), we construct a so-called “conflict graph” which captures the fundamental unobservabiility associated with the estimand on the underlying network. The Conflict Graph Design aims to randomly assign treatment by first assigning “desired” exposures and then resolving these conflicts in desired exposures according to an algorithmically constructed importance ordering. In this way, the proposed experimental design depends on both the underlying network and the causal estimand under investigation. We show that a modified Horvitz–Thompson estimator attains a variance of $O( lambda / n )$ under the design, where $lambda$ is the largest eigenvalue of the adjacency matrix of the conflict graph, which is a global measure of connectivity. These rates improve upon the best known rates for a variety of estimands (e.g. total treatment effects and direct effects) and we conjecture that this rate is optimal. Finally, we provide consistent variance estimators and asymptotically valid confidence intervals, which facilitate inference of the causal effect under investigation.

    Joint work with Vardis Kandiros, Charis Pipis, and Costis Dakalakis at MIT.

    Bio:

    Christopher Harshaw is an Assistant Professor in the Columbia Statistics department. He received a PhD from Yale University and was a FODSI postdoc hosted jointly between UC Berkeley and MIT. His research lies at the interface of causal inference, algorithm design, and machine learning with a particular focus on the design and analysis of randomized experiments. His research appears in Journal of the American Statistical Association, Electronic Journal of Statistics, COLT, ICML, NeurIPS, and won Best Paper Award at the NeurIPS 2022 workshop, CML4Impact.

    Find out more »: The Conflict Graph Design: Estimating Causal Effects Under Interference
  • Sampling through optimization of divergences on the space of measures

    On October 18, 2024 at 11:00 am till 12:00 pm
    Anna Korba, ENSAE/CREST
    E18-304

    Abstract:
    Sampling from a target measure when only partial information is available (e.g. unnormalized density as in Bayesian inference, or true samples as in generative modeling) is a fundamental problem in computational statistics and machine learning. The sampling problem can be cast as an optimization one over the space of probability distributions of a well-chosen discrepancy,  e.g. a divergence or distance to the target. In this talk, I will discuss several properties of sampling algorithms for some choices of discrepancies (standard ones, or novel proxies), both regarding their optimization and quantization aspects.

    Bio:
    Anna Korba is an assistant professor at ENSAE/CREST in the Statistics Department. Her main line of research is machine learning, and she has been working on kernel methods, optimal transport, optimization, particle systems and preference learning. At the moment, she is particularly interested in sampling and optimization methods. She received her PhD from Telecom ParisTech, under the guidance of Prof. Stephan Clémençon. Previously, she was a postdoctoral researcher with Arthur Gretton at University College London in the Gatsby Computational Neuroscience Unit.

    Find out more »: Sampling through optimization of divergences on the space of measures
  • Central Limit Theorems for Smooth Optimal Transport Maps

    On October 11, 2024 at 11:00 am till 12:00 pm
    Tudor Manole, MIT
    E18-304

    Abstract: One of the central objects in the theory of optimal transport is the Brenier map: the unique monotone transformation which pushes forward an absolutely continuous probability law onto any other given law. Recent work has identified a class of plugin estimators of Brenier maps which achieve the minimax L^2 risk, and are simple to compute. In this talk, we show that such estimators obey pointwise central limit theorems. This provides a first step toward the question of performing statistical inference for smooth Brenier maps in general dimension. We further show that these results have implications for the problem of estimating the 2-Wasserstein distance. In particular, they allow us to develop the higher-order semiparametric efficiency theory for the Wasserstein distance, and as a consequence, we derive an efficient estimator of the Wasserstein distance under nearly optimal smoothness conditions.

    This talk is based on joint work with Sivaraman Balakrishnan, Jonathan Niles-Weed, and Larry Wasserman.

    Bio: Tudor Manole is a Norbert Wiener postdoctoral associate in the Statistics and Data Science Center at the Massachusetts Institute of Technology (MIT). He earned his PhD in Statistics at Carnegie Mellon University, where he was advised by Sivaraman Balakrishnan and Larry Wasserman. He is broadly interested in nonparametric statistics and statistical machine learning. Some specific research interests include statistical optimal transport, latent variable models, minimax hypothesis testing, and their applications to the physical sciences.

    Find out more »: Central Limit Theorems for Smooth Optimal Transport Maps
  • Large cycles for the interchange process

    On September 27, 2024 at 11:00 am till 12:00 pm
    Allan Sly, Princeton University
    E18-304

    Abstract: The interchange process $sigma_T$ is a random permutation valued stochastic process on a graph evolving in time by transpositions on its edges at rate 1. On $Z^d$, when $T$ is small all the cycles of the permutation $sigma_T$ are finite almost surely but it is conjectured that infinite cycles appear in dimensions 3 and higher for large times.  In this talk I will focus on the finite volume case where we establish that macroscopic cycles with Poisson-Dirichlet statistics appear for large times in dimensions 5 and above.

    Bio: Allan Sly is the Anthony H. P. Lee ’79 Professor of Mathematics at Princeton University. His research is in discrete probability theory and its applications to problems from statistical physics, theoretical computer science and theoretical statistics. Most of his work is centered on stochastic processes on networks in a range of different settings. Two major focuses are the analysis of the mixing times of Markov chains, particularly Glauber dynamics and the role phase transitions play in the computational complexity and in probabilistic models more generally. He completed his PhD in Statistics at UC Berkeley in 2009 and has been a postdoc in the Theory Group at Microsoft Research.

    Find out more »: Large cycles for the interchange process
  • Model-agnostic covariate-assisted inference on partially identified causal effects

    On September 13, 2024 at 11:00 am till 12:00 pm
    Lihua Lei, Stanford University
    E18-304

    Abstract: Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notably, this property allows analysts to use the multiplier bootstrap to select covariates and models without sacrificing validity even if the true model is not included. Furthermore, if the conditional distributions are estimated at semiparametric rates, our approach matches the performance of an oracle with perfect knowledge of the outcome model. Finally, we propose an efficient computational framework, enabling implementation on many practical problems in causal inference.

    Bio: Lihua Lei is an Assistant Professor of Economics at Stanford Graduate School of Business (GSB), an Assistant Professor of Statistics (by courtesy), and a Faculty Fellow at Institute for Economic Policy Research (SIEPR). His research interest includes distribution-free inference, causal inference, econometrics, and multiple testing.

    Find out more »: Model-agnostic covariate-assisted inference on partially identified causal effects
  • Adversarial combinatorial bandits for imperfect-information sequential games

    On May 17, 2024 at 11:00 am till 12:00 pm
    Gabriele Farina, MIT
    E18-304

    Abstract:

    This talk will focus on learning policies for tree-form decision problems (extensive-form games) from adversarial feedback. In principle, one could convert learning in any extensive-form game (EFG) into learning in an equivalent normal-form game (NFG), that is, a multi-armed bandit problem with one arm per tree-form policy. However, doing so comes at the cost of an exponential blowup of the strategy space. So, progress on NFGs and EFGs has historically followed separate tracks, with the EFG community often having to catch up with advances (e.g., last-iterate convergence and predictive regret bounds) from the larger NFG community. In this talk, I will show that the combinatorial structure of EFGs enables simulating the multiplicative weights update algorithm over the set of tree-form strategies efficiently (i.e., in linear time in the size of the game tree instead of the number of tree-form policies) using a kernel trick. This reduction closes several standing gaps between NFG and EFG learning, by enabling direct, black-box transfer to EFGs of desirable properties of learning dynamics that were so far known to be achievable only in NFGs.

    Bio:

    Gabriele Farina is an Assistant Professor in the Laboratory for Information and Decision Systems at MIT EECS, and a member of the Operations Research Center. Before joining LIDS, he spent a year as a Research Scientist at Meta AI. His research interests are at the intersection of operations research, economics, and computation, with a focus on learning and optimization methods for sequential decision-making under imperfect information. He obtained his Ph.D. in computer science from Carnegie Mellon University.

    Find out more »: Adversarial combinatorial bandits for imperfect-information sequential games
  • Matrix displacement convexity and intrinsic dimensionality

    On May 10, 2024 at 11:00 am till 12:00 pm
    Yair Shenfeld, Brown University
    E18-304

    Abstract:
    The space of probability measures endowed with the optimal transport metric has a rich structure with applications in probability, analysis, and geometry. The notion of (displacement) convexity in this space was discovered by McCann, and forms the backbone of this theory.  I will introduce a new, and stronger, notion of displacement convexity which operates on the matrix level. The motivation behind this definition is to capture the intrinsic dimensionality of probability measures which could have very different behaviors along different directions in space. I will show that a broad class of flows satisfy matrix displacement convexity: heat flow, optimal transport, entropic interpolation, mean-field games, and semiclassical limits of non-linear Schrödinger equations. This leads to intrinsic dimensional functional inequalities which provide a systematic improvement on numerous classical functional inequalities.

    Bio:
    Yair Shenfeld is an Assistant Professor of Applied Mathematics at Brown University.
    Previously, he was a C.L.E. Moore instructor and an NSF postdoctoral fellow in the Mathematics department at MIT. He completed his PhD at Princeton University.
    He works in high-dimensional probability and its interactions with analysis, geometry, and mathematical physics. His current research interests include stochastic analysis and functional inequalities, optimal transport, and renormalization group methods. I am also interested in extremal problems in convex geometry. See
    publications for details.

    Find out more »: Matrix displacement convexity and intrinsic dimensionality
  • Consensus-based optimization and sampling

    On May 3, 2024 at 11:00 am till 12:00 pm
    Franca Hoffmann, California Institute of Technology
    E18-304

    Abstract: Particle methods provide a powerful paradigm for solving complex global optimization problems leading to highly parallelizable algorithms. Despite widespread and growing adoption, theory underpinning their behavior has been mainly based on meta-heuristics. In application settings involving black-box procedures, or where gradients are too costly to obtain, one relies on derivative-free approaches instead. This talk will focus on two recent techniques, consensus-based optimization and consensus-based sampling. We explain how these methods can be used for the following two goals: (i) generating approximate samples from a given target distribution, and (ii) optimizing a given objective function. They circumvent the need for gradients via Laplace’s principle. We investigate the properties of this family of methods in terms of various parameter choices and present an overview of recent advances in the field.

    Bio: Prof. Franca Hoffmann’s research interests lie at the interface of model-driven and data-driven approaches. She works on the development and application of mathematical tools for partial differential equation (PDE) analysis and data analysis.
    Broadly, Franca’s interests in the area of partial differential equations revolve around non-linear drift-diffusion equations, kinetic theory, many particle systems and their mean-field limits, gradient flows, entropy methods, optimal transport, functional inequalities, parabolic and hyperbolic scaling techniques and hypocoercivity. In the area of data analysis, Franca is working on graph-based learning, and the development and analysis of optimization and sampling algorithms. The use of graph Laplacians in graph-based learning allows for a rigorous mathematical analysis of unsupervised and semi-supervised learning algorithms, and their continuum counterparts can be studied using tools from PDE theory. Optimization and sampling are at the heart of parameter estimation and uncertainty quantification in Bayesian inference, and are used in many modern machine learning approaches.
    Franca works at the intersection of these fields, not only exploring what mathematical analysis can do for applications, but also what applications can do for mathematics.

    Franca obtained her master’s in mathematics from Imperial College London (UK) and holds a PhD from the Cambridge Centre for Analysis at University of Cambridge (UK). She held the position of von Kármán instructor at Caltech from 2017 to 2020, then joined University of Bonn (Germany) as Junior Professor and Quantum Leap Africa in Kigali, Rwanda (African Institute for Mathematical Sciences) as AIMS-Carnegie Research Chair in Data Science, before arriving at the California Institute of Technology as Assistant Professor in 2022.

    Find out more »: Consensus-based optimization and sampling
  • Emergent outlier subspaces in high-dimensional stochastic gradient descent

    On April 26, 2024 at 11:00 am till 12:00 pm
    Reza Gheissari, Northwestern University
    E18-304

    Abstract:  It has been empirically observed that the spectrum of neural network Hessians after training have a bulk concentrated near zero, and a few outlier eigenvalues. Moreover, the eigenspaces associated to these outliers have been associated to a low-dimensional subspace in which most of the training occurs, and this implicit low-dimensional structure has been used as a heuristic for the success of high-dimensional classification. We will describe recent rigorous results in this direction for the Hessian spectrum over the course of the training by SGD in high-dimensional classification tasks with one and two-layer networks. We focus on the separation of outlier eigenvalues from the bulk, and subsequent crystallization of the outlier eigenvectors. Based on joint work with Ben Arous, Huang, and Jagannath.

    Bio: Reza Gheissari is an assistant professor of mathematics at Northwestern University. Prior to joining Northwestern, he obtained his Ph.D. at NYU’s Courant Institute and was a Miller Postdoctoral Fellow at UC Berkeley. His research area is probability theory, with particular interest in out-of-equilibrium behavior of Markov chains, and relations to sampling, optimization, and learning problems in high dimensions.

    Find out more »: Emergent outlier subspaces in high-dimensional stochastic gradient descent