Statistics and Data Science Seminar


list_alt
  • Adaptivity in Domain Adaptation and Friends

    No dates for this event
    Samory Kpotufe, Columbia University
    E18-304

    Abstract:
    Domain adaptation, transfer, multitask, meta, few-shots, representation, or lifelong learning … these are all important recent directions in ML that all touch at the core of what we might mean by ‘AI’. As these directions all concern learning in heterogeneous and ever-changing environments, they all share a central question: what information a data distribution may have about another, critically, in the context of a given estimation problem, e.g., classification, regression, bandits, etc.

    Our understanding of these problems is still rather fledgling. I plan to present both some recent positive results and also some negative ones. On one hand, recent measures of discrepancy between distributions, fine-tuned to given estimation problems (classification, bandits, etc) offer a more optimistic picture than existing probability metrics (e.g. Wasserstein, TV) or divergences (KL, Renyi, etc) in terms of achievable rates. On the other hand, when considering seemingly simple extensions of basic settings, e.g., extensions to multiple choices of source datasets (as in multitask or multi source learning), or extensions to multiple prediction models to transfer (i.e., model selection under distribution shift), it turns out that minimax oracle rates are not always adaptively achievable, i.e., domain knowledge is necessary. Such negative results suggest that these problems are more structured in practice than what usual formalisms are so far able to capture.

    The talk will be based on joint work with collaborators over the last few years, namely, G. Martinet, S. Hanneke, J. Suk, Y. Mahdaviyeh.

    Bio:
    Samory Kpotufe is an Associate Professor in Statistics at Columbia University. He works in statistical machine learning, with an emphasis on common nonparametric methods (e.g., kNN, trees, kernel averaging) and a particular interest in adaptivity (i.e., how to automatically leverage beneficial aspects of data as opposed to designing specifically for each scenario). He has previously held positions at the Max Planck Institute in Germany, the Toyota Technological Institute at Chicago, and Princeton University, and was a recent visiting member at the Institute of Advanced Study at Princeton.

    Find out more »: Adaptivity in Domain Adaptation and Friends
  • Adaptive Decision Tree Methods

    On April 21, 2023 at 11:00 am till 12:00 pm
    Matias Cattaneo, Princeton University
    E18-304

    Abstract:
    This talk is based on two recent papers: 1. “On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation” and 2. “Convergence Rates of Oblique Regression Trees for Flexible
    Function Libraries”

    1. Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (trained by adaptive recursive partitioning) for such purposes by demonstrating that they can fail to achieve polynomial rates of convergence in uniform norm, even with pruning. Instead, the convergence may be poly-logarithmic or, in some important special cases, such as honest regression trees, fail completely. We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures, at the cost of losing interpretability and introducing two additional tuning parameters. The two hallmarks of random forests, subsampling and the random feature selection mechanism, are seen to each distinctively contribute to achieving nearly optimal performance for the model class considered.

    2. We develop a theoretical framework for the analysis of oblique decision trees, where the splits at each decision node occur at linear combinations of the covariates (as opposed to conventional tree constructions that force axis-aligned splits involving only a single covariate).
    While this methodology has garnered significant attention from the computer science and optimization communities since the mid-80s, the advantages they offer over their axis-aligned counterparts remain only empirically justified, and explanations for their success are largely based on heuristics. Filling this long-standing gap between theory and practice, we show that oblique regression trees (constructed by recursively minimizing squared error) satisfy a type of oracle inequality and can adapt to a rich library of regression models consisting of
    linear combinations of ridge functions and their limit points. This provides a quantitative baseline to compare and contrast decision trees with other less interpretable methods, such as projection pursuit regression and neural networks, which target similar model forms. Contrary to popular belief, one need not always trade-off interpretability with accuracy. Specifically, we show that, under suitable conditions, oblique decision trees achieve similar predictive accuracy as neural networks for the same library of regression models. To address the combinatorial complexity of finding the optimal splitting hyperplane at each decision node, our proposed theoretical framework can accommodate many existing computational tools in the literature.
    Our results rely on (arguably surprising) connections between recursive adaptive partitioning and sequential greedy approximation algorithms for convex optimization problems (e.g., orthogonal greedy algorithms), which may be of independent theoretical interest.

    Bio: Matias D. Cattaneo is a Professor of Operations Research and Financial Engineering (ORFE) at Princeton University, where he is also an Associated Faculty in the Department of Economics, the Center for Statistics and Machine Learning (CSML), and the Program in Latin American Studies (PLAS). His research spans econometrics, statistics, data science and decision science, with particular interests in program evaluation and causal inference. As part of his main research agenda, he has developed novel semi-/non-parametric, high-dimensional, and machine learning inference procedures with demonstrably superior robustness to tuning parameter and other implementation choices. Matias was elected Fellow of the Institute of Mathematical Statistics (IMS) in 2022.

    Find out more »: Adaptive Decision Tree Methods
  • Free Discontinuity Design (joint w/ David van Dijcke)

    On April 7, 2023 at 11:00 am till 12:00 pm
    Florian Gunsilius, University of Michigan
    E18-304

    Abstract:
    Regression discontinuity design (RDD) is a quasi-experimental impact evaluation method ubiquitous in the social- and applied health sciences. It aims to estimate average treatment effects of policy interventions by exploiting jumps in outcomes induced by cut-off assignment rules. Here, we establish a correspondence between the RDD setting and free discontinuity problems, in particular the celebrated Mumford-Shah model in image segmentation. The Mumford-Shah model is non-convex and hence admits local solutions in general. We circumvent this issue by relying on well-known convex relaxations based on the calibration method to generate global solutions. We derive deterministic and statistical convergence properties of this convex relaxation and demonstrate the utility of the resulting free discontinuity design (FDD) estimator in two empirical applications. Unlike canonical RDD estimators, our FDD estimator (i) extends to settings with running variables of any dimension, most notably the spatial setting; (ii) does not require the location of the boundary (the cut-off) to be known precisely, but rather estimates it jointly with the response function; (iii) does not depend on bandwidth selection rules.

    Bio:
    Florian Gunsilius is an Assistant Professor of Econometrics at the University of Michigan, Ann Arbor. He graduated from Brown University with a PhD in economics, receiving the Joukowsky family foundation dissertation award in the social sciences. Before joining the faculty at the University of Michigan, he spent one year as a Visiting Postdoctoral Associate at the Department of Economics at MIT. His work focuses on nonparametric approaches to statistical identification, estimation, and inference. His current focus is on developing geometric methods for causal inference.

    Find out more »: Free Discontinuity Design (joint w/ David van Dijcke)
  • Variational methods in reinforcement learning

    No dates for this event
    Martin Wainwright, MIT
    E18-304

    Abstract: Reinforcement learning is the study of models and procedures for optimal sequential decision-making under uncertainty.  At its heart lies the Bellman optimality operator, whose unique fixed point specifies an optimal policy and value function.  In this talk, we discuss two classes of variational methods that can be used to obtain approximate solutions with accompanying error guarantees.  For policy evaluation problems based on on-line data, we present Krylov-Bellman boosting, which combines ideas from Krylov methods with non-parametric boosting.  For policy optimization problems based on historical data, we describe a variational formulation that combines Galerkin approximation and mirror descent.  We discuss an oracle inequality that trades off optimally between the value and statistical uncertainty.

    Based on joint work with Eric Xia (MIT), Andrea Zanette (Berkeley) and Emma Brunskill (Stanford).

    Bio: Martin Wainwright is the Cecil H. Green Professor of EECS as well as a Principal Investigator in LIDS. He was previously the Howard Friesen Chair with a joint appointment between the Department of Statistics and the Department of EECS at University of California at Berkeley. He has made seminal contributions in high-dimensional statistics, control and optimization, and statistical machine learning.

    Among other awards, (including the George M. Sprowls Prize for the thesis he developed at LIDS), he has received the COPSS Presidents’ Award (2014) from the Joint Statistical Societies, the David Blackwell Lectureship (2017), and Medallion Lectureship (2013) from the Institute of Mathematical Statistics, and Best Paper awards from the IEEE Signal Processing Society and IEEE Information Theory Society. He was a section lecturer at the International Congress of Mathematicians in 2014.

    Find out more »: Variational methods in reinforcement learning
  • Spectral Independence: A New Tool to Analyze Markov Chains

    On March 10, 2023 at 11:00 am till 12:00 pm
    Kuikui Liu, University of Washington
    E18-304

    Abstract:
    Sampling from high-dimensional probability distributions is a fundamental and challenging problem encountered throughout science and engineering. One of the most popular approaches to tackle such problems is the Markov chain Monte Carlo (MCMC) paradigm. While MCMC algorithms are often simple to implement and widely used in practice, analyzing the rate of convergence to stationarity, i.e. the “mixing time”, remains a challenging problem in many settings.

    I will describe a new technique based on pairwise correlations called “spectral independence”, which has been used to break long-standing barriers and resolve several decades-old open problems in MCMC theory. Through this technique, we’ve further established new connections with other areas such as statistical physics, geometry of polynomials, the emerging study of high-dimensional expanders, and more. Applications include discrete log-concave distributions, graphical models, and concentration inequalities. Based on several joint works with Dorna Abdolazimi, Nima Anari, Zongchen Chen, Shayan Oveis Gharan, Nitya Mani, Ankur Moitra, Eric Vigoda, Cynthia Vinzant, and June Vuong.

    Bio:
    Kuikui Liu is a Postdoctoral Associate at MIT CSAIL, previously in the Theory Group at the Paul G. Allen School for Computer Science and Engineering at the University of Washington. (UW CSE). His research interests are in high-dimensional geometry and analysis of Markov chains. He develops and uses mathematical tools from fields such as high-dimensional expanders, geometry of polynomials, and statistical physics.

    Find out more »: Spectral Independence: A New Tool to Analyze Markov Chains
  • Spectral pseudorandomness and the clique number of the Paley graph

    On March 3, 2023 at 11:00 am till 12:00 pm
    Dmitriy (Tim) Kunisky, Yale University
    E18-304

    Abstract:

    The Paley graph is a classical number-theoretic construction of a graph that is believed to behave “pseudorandomly” in many regards. Accurately bounding the clique number of the Paley graph is a long-standing open problem in number theory, with applications to several other questions about the statistics of finite fields. I will present recent results studying the application of convex optimization and spectral graph theory to this problem, which involve understanding the extent to which the Paley graph is “spectrally pseudorandom” in various senses.

    First, I will show a lower bound on the degree 4 sum-of-squares relaxation of the clique number of the Paley graph, derandomizing an analogous result for Erdos-Renyi random graphs due to Deshpande and Montanari (2015). On the other hand, I will offer some evidence that this relaxation may in fact yield bounds improving on the state of the art, thanks to the spectrum of the Paley graph being sufficiently different from that of an Erdos-Renyi graph.

    Second, I will show that certain deterministic induced subgraphs of the Paley graph have the same limiting spectrum as induced subgraphs on random sets of vertices of the same size. I will discuss how this phenomenon arises as a consequence of asymptotic freeness (in the sense of free probability) of certain matrices associated with the Paley graph. Finally, I will present conjectures describing a stronger analogy between random and pseudorandom deterministic induced subgraphs that would lead to clique number bounds improving on the state of the art.

    Based partly on joint work with Xifan Yu.

    Bio:

    Dmitriy (Tim) Kunisky is a postdoctoral associate at Yale University’s Department of Computer Science, hosted by Daniel Spielman. Before joining Yale, he received his PhD in Mathematics from the Courant Institute at New York University, advised by Afonso Bandeira and Gerard Ben Arous. He is broadly interested in both heuristic and rigorous methods for assessing the computational hardness of random optimization problems, especially problems that can be approached using convex optimization. More recently, he has also studied universality and derandomization techniques for showing that the phenomena governing random problems also occur in semi-random and deterministic settings.

    Find out more »: Spectral pseudorandomness and the clique number of the Paley graph
  • On the statistical cost of score matching

    On February 24, 2023 at 11:00 am till 12:00 pm
    Andrej Risteski, Carnegie Mellon University
    E18-304

    Abstract:
    Energy-based models are a recent class of probabilistic generative models wherein the distribution being learned is parametrized up to a constant of proportionality (i.e. a partition function). Fitting such models using maximum likelihood (i.e. finding the parameters which maximize the probability of the observed data) is computationally challenging, as evaluating the partition function involves a high dimensional integral. Thus, newer incarnations of this paradigm instead train other losses which obviate the need to evaluate partition functions. Prominent examples include score matching (in which we fit the score of the data distribution) and noise contrastive estimation (in which we set up a classification problem to distinguish data from noise).

    What’s gained with these approaches is tractable gradient-based algorithms. What’s lost is less clear: for example, since maximum likelihood is asymptotically optimal in terms of statistical efficiency, how suboptimal are losses like score matching? We will provide partial answers to this question — and in the process uncover connections between geometric properties of the distribution (Poincaré and isoperimetric constants) and the statistical efficiency of score matching.

    Based primarily on  https://arxiv.org/abs/2210.00726https://arxiv.org/abs/2210.00189https://arxiv.org/abs/2110.11271

    Bio:
    Andrej Risteski is an Assistant Professor in the Machine Learning Department at Carnegie Mellon University. Prior to joining CMU, he was a Norbert Wiener Research Fellow jointly in the Applied Math department and IDSS at MIT. He received his PhD in Computer Science in Princeton University. His research interests lie in the intersection of machine learning, statistics and theoretical computer science, spanning topics like (probabilistic) generative models, algorithmic tools for learning and inference, representation and self-supervised learning, out-of-distribution generalization and applications of neural approaches to natural language processing and scientific domains. Andrej is the recipient of an Amazon Research Award (“Causal + Deep Out-of-Distribution Learning”) and an NSF CAREER Award (“Theoretical Foundations of Modern Machine Learning Paradigms: Generative and Out-of-Distribution”).

    Find out more »: On the statistical cost of score matching
  • Generative Models, Normalizing Flows, and Monte Carlo Samplers

    On February 17, 2023 at 11:00 am till 12:00 pm
    Eric Vanden-Eijnden, New York University
    E18-304

    Abstract:
    Contemporary generative models used in the context of unsupervised learning have primarily been designed around the construction of a map between two probability distributions that transform samples from the first into samples from the second. Advances in this domain have been governed by the introduction of algorithms or inductive biases that make learning this map, and the Jacobian of the associated change of variables, more tractable. The challenge is to choose what structure to impose on the transport to best reach a complex target distribution from a simple one used as base, while maintaining computational efficiency. In this talk, I will formalize this problem and discuss how to construct such transport maps by introducing a continuous-time normalizing flow whose velocity is the minimizer of a simple quadratic loss expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and to estimate their likelihood. In addition, this flow can be optimized to minimize the path length in Wasserstein-2 metric, thereby paving the way for building transport maps that are optimal in the sense of Monge-Ampere. I will also also contextualize this approach in its relation to score-based diffusion models that have gained a lot of popularity lately. Finally, I will discuss how such normalizing flows can be used in the context of Monte-Carlo sampling, with applications to the calculation of free energies and Bayes factors.

    Based on joint works with Michael Albergo, Marylou Gabrie, and Grant Rotskoff

    Bio:
    Eric Vanden-Eijnden ( https://wp.nyu.edu/courantinstituteofmathematicalsciences-eve2/ ) is a Professor of Mathematics at the Courant Institute of Mathematical Sciences, New York University. His research focuses on the mathematical and computational aspects of statistical mechanics, with applications to complex dynamical systems arising in molecular dynamics, materials science, atmosphere-ocean science, fluids dynamics, and neural networks. He is also interested in the mathematical foundations of machine learning (ML) and the applications of ML in scientific computing. He is known for the development and analysis of multiscale numerical methods for systems whose dynamics span a wide range of spatio-temporal scales. He is the winner of the Germund Dahlquist Prize and the J.D. Crawford Prize, and a recipient of the Vannevar Bush Faculty Fellowship. He was a plenary speaker at the 2015 International Congress of Industrial and Applied Mathematics (ICIAM) in Beijing and an invited speaker at the 2022 International Congress of Mathematics (ICM).

    Find out more »: Generative Models, Normalizing Flows, and Monte Carlo Samplers
  • Coding convex bodies under Gaussian noise, and the Wills functional

    On December 2, 2022 at 11:00 am till 12:00 pm
    Jaouad Mourtada, ENSAE Paris
    E18-304

    Abstract:
    We consider the problem of sequential probability assignment in the Gaussian setting, where one aims to predict (or equivalently compress) a sequence of real-valued observations almost as well as the best Gaussian distribution with mean constrained to a general domain. First, in the case of a convex constraint set K, we express the hardness of the prediction problem (the minimax regret) in terms of the intrinsic volumes of K. We then establish a comparison inequality for the minimax regret in the general nonconvex case, which underlines the metric nature of this quantity and generalizes the Slepian-Sudakov-Fernique comparison principle for the Gaussian width. Motivated by this inequality, we present a sharp (up to universal constants) characterization of the considered functional for a general nonconvex set, in terms of metric complexity measures. This implies isomorphic estimates for the log-Laplace transform of the intrinsic volume sequence of a convex body. We finally relate and contrast our findings with classical asymptotic results in information theory.

    Bio:
    Jaouad Mourtada is an Assistant Professor in the Department of Statistics at ENSAE/CREST. Prior to that, he was a postdoctoral researcher at the Laboratory for Computational and Statistical Learning at the University of Genoa. He recieved his Ph.D. in the Center for Applied Mathematics (CMAP) at École Polytechnique. His research interests are at the intersection of statistics and learning theory, specifically in understanding the complexity of prediction and estimation problems.

    Find out more »: Coding convex bodies under Gaussian noise, and the Wills functional
  • Distance-based summaries and modeling of evolutionary trees

    On November 18, 2022 at 11:00 am till 12:00 pm
    Julia Palacios, Stanford University
    E18-304

    Abstract:  Phylogenetic trees are mathematical objects of great importance used to model hierarchical data and evolutionary relationships with applications in many fields including evolutionary biology and genetic epidemiology. Bayesian phylogenetic inference usually explore the posterior distribution of trees via Markov Chain Monte Carlo methods, however assessing uncertainty and summarizing distributions remains challenging for these types of structures. In this talk I will first introduce a distance metric on the space of unlabeled ranked tree shapes and genealogies. I will then use it to define several summary statistics such as the Fréchet mean, variance, and interquartile sets. I will then provide an efficient combinatorial optimization algorithm for computation and show the applicability of our summaries for studying popular tree distributions and for comparing the SARS-CoV-2 evolutionary trees across different locations during the COVID-19 epidemic in 2020.

    Bio:  Dr. Julia A. Palacios is an Assistant Professor in the departments of Statistics, Biomedical Data Science and by courtesy in Biology at Stanford University. Professor Palacios completed her PhD in Statistics at the University of Washington in 2013. She did a joint postdoc at Harvard University and Brown University before joining Stanford. In her research, Professor Palacios seeks to provide statistically rigorous answers to concrete, data-driven questions in population genetics, epidemiology, and comparative genomics, often involving probabilistic modeling of evolutionary forces and the development of computationally tractable methods that are applicable to big data problems.

    Find out more »: Distance-based summaries and modeling of evolutionary trees