Events

Statistics and Data Science Seminar

Geometric EDA for Random Objects
On March 17, 2023 at 11:00 am till 12:00 pm
Paromita Dubey, University of Southern California
E18-304

Abstract: In this talk I will propose new tools for the exploratory data analysis of data objects taking values in a general separable metric space. First, I will introduce depth profiles, where the depth profile of a point ω in the metric space refers to the distribution of the distances between ω and the data objects. I will describe how depth profiles can be harnessed to define transport ranks, which capture the centrality of each element in the metric space with respect to the data cloud. Next, I will discuss the properties of transport ranks and show how they can be an effective device for detecting and visualizing patterns in samples of random objects. Together with practical illustrations I will establish large sample properties of the estimators of the depth profiles and the transport ranks which hold for a wide class of metric spaces. Finally, I will describe a new two sample test geared towards populations of random objects by utilizing the depth profiles corresponding to the data objects. I will demonstrate the efficacy of this new approach on distributional data comprising of a sample of age-at-death distributions for various countries, for compositional data through energy usage for the U.S. states and for neuroimaging network data. This talk is based on joint work with Yaqing Chen and Hans-Georg Müller.

Bio: Paromita Dubey is an Assistant Professor in the Data Sciences and Operations department at the USC Marshall School of Business since 2021. Her research centers around developing novel statistical frameworks for non-Euclidean data, examples being distribution and network valued data. She is also working on addressing challenges in the analysis of dynamic time-evolving data, particularly in non-Euclidean and high dimensional settings. Aside from theoretical challenges, she enjoys working collaboratively to develop statistical frameworks arising in application oriented challenges in population genetics, environmental sciences and social sciences.

Find out more »: Geometric EDA for Random Objects
Sampling from the SK measure via algorithmic stochastic localization
On October 28, 2022 at 11:00 am till 12:00 pm
Ahmed El Alaoui, Cornell University
E18-304

Abstract: I will present an algorithm which efficiently samples from the Sherrington-Kirkpatrick (SK) measure with no external field at high temperature. The approach is based on the stochastic localization process of Eldan, together with a subroutine for computing the mean vectors of a family of SK measures tilted by an appropriate external field. This approach is general and can potentially be applied to other discrete or continuous non-log-concave problems.

We show that the algorithm outputs a sample within vanishing rescaled Wasserstein distance to the SK measure, for all inverse temperatures beta < 1/2. In a recent development, Celentano (2022) shows that our algorithm succeeds for all beta < 1, i.e., in the entire high temperature phase. Conversely, we show that in the low temperature phase beta >1, no ‘stable’ algorithm can approximately sample from the SK measure. In this case we show that the SK measure is unstable to perturbations in a certain sense. This settles the computational tractability of sampling from SK for all temperatures except the critical one. This is based on a joint work with Andrea Montanari and Mark Sellke.

Bio: Ahmed El Alaoui joined the Statistics and Data Science faculty at Cornell University as an assistant professor in January 2021. He received his PhD in 2018 in Electrical Engineering and Computer Sciences from UC Berkeley, advised by Michael I. Jordan. He was afterwards a postdoctoral researcher at Stanford University, hosted by Andrea Montanari. He is currently a Simons-Berkeley research fellow at the Simons Institute for the Theory of Computing at UC Berkeley. His research interests revolve around high-dimensional phenomena in statistics and probability theory, statistical physics, algorithms, and problems where these areas meet.

Find out more »: Sampling from the SK measure via algorithmic stochastic localization
Precise high-dimensional asymptotics for AdaBoost via max-margins & min-norm interpolants
On November 19, 2021 at 11:00 am till 12:00 pm
Pragya Sur (Harvard University)
E18-304

Abstract: This talk will introduce a precise high-dimensional asymptotic theory for AdaBoost on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is asymptotically separable. Under a class of statistical models, we will provide an (asymptotically) exact analysis of the max-min-L1-margin and the min-L1-norm interpolant. In turn, this will characterize the generalization error of AdaBoost, when the algorithm interpolates the training data and maximizes an empirical L1 margin. On the computational front, we will provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Our theory provides several insights into properties of AdaBoost; for instance, the larger the dimensionality ratio p/n, the faster the optimization reaches interpolation. Our statistical and computational arguments can handle (1) finite-rank spiked covariance models for the feature distribution and (2) variants of AdaBoost corresponding to general Lq-geometry, for q in [1,2]. This is based on joint work with Tengyuan Liang.

——————–

Bio: Pragya Sur is an Assistant Professor in the Statistics Department at Harvard University. Her research broadly spans high-dimensional statistics, statistical machine learning, robust inference and prediction for multi-study/multi-environment heterogeneous data. She is simultaneously interested in applications of large scale statistical methods to computational neuroscience and genetics. Her research is currently supported by a William F. Milton Fund Award and an NSF DMS award. Previously, she was a postdoctoral fellow at the Center for Research on Computation and Society, Harvard John A. Paulson School of Engineering and Applied Sciences. She received a Ph.D. in Statistics from Stanford University in 2019, where she received the Theodore W. Anderson Theory of Statistics Dissertation Award and a Ric Weiland Graduate Fellowship in the Humanities and Sciences.

Find out more »: Precise high-dimensional asymptotics for AdaBoost via max-margins & min-norm interpolants
Causal Representation Learning – A Proposal
On April 16, 2022 at 11:00 am till 12:00 pm
Caroline Uhler, MIT
E18-304

Abstract: The development of CRISPR-based assays and small molecule screens holds the promise of engineering precise cell state transitions to move cells from one cell type to another or from a diseased state to a healthy state. The main bottleneck is the huge space of possible perturbations/interventions, where even with the breathtaking technological advances in single-cell biology it will never be possible to experimentally perturb all combinations of thousands of genes or compounds. This important biological problem calls for a framework that can integrate data from different modalities to identify causal representations, predict the effect of unseen interventions, and identify the optimal interventions to induce precise cell state transition. Traditional representation learning methods, although often highly successful in predictive tasks, do not generally elucidate causal relationships. In this talk, we will present initial ideas towards building a statistical and computational framework for causal representation learning and its application towards optimal intervention design.

–

Bio: Caroline Uhler is the Henry L. and Grace Doherty associate professor in EECS and IDSS, a member of SDSC, LIDS and the ORC, Machine Learning at MIT, and is also core member of the Broad Institute, where she co-directs the Eric and Wendy Schmidt Center. She is an elected member of the International Statistical Institute and the recipient of a Simons Investigator Award, a Sloan Research Fellowship, an NSF Career Award, a Sofja Kovalevskaja Award from the Humboldt Foundation, and a START Award from the Austrian Science Fund.

Find out more »: Causal Representation Learning – A Proposal
Learning with Random Features and Kernels: Sharp Asymptotics and Universality Laws
On April 22, 2022 at 11:00 am till 12:00 pm
Yue M. Lu, Harvard University
E18-304

Abstract: Many new random matrix ensembles arise in learning and modern signal processing. As shown in recent studies, the spectral properties of these matrices help answer crucial questions regarding the training and generalization performance of neural networks, and the fundamental limits of high-dimensional signal recovery. As a result, there has been growing interest in precisely understanding the spectra and other asymptotic properties of these matrices. Unlike their classical counterparts, these new random matrices are often highly structured and are the result of nonlinear transformations. This combination of structure and nonlinearity leads to substantial technical challenges when applying existing tools from random matrix theory to these new random matrix ensembles. In this talk, we will consider learning by random feature models and the related problem of kernel ridge regression. In each case, a nonlinear random matrix plays a prominent role. We provide an exact characterization of the asymptotic training and generalization errors of these models. These results reveal the important roles played by the regularization, the loss function and the activation function in the mitigation of the double descent phenomenon in learning. The asymptotic analysis is made possible by a general universality theorem, which establishes the asymptotic equivalence between the nonlinear random matrices and a surrogate linear random matrix ensemble that is much easier to work with.

–

Bio: Yue M. Lu attended the University of Illinois at Urbana-Champaign, where he received the M.Sc. degree in mathematics and the Ph.D. degree in electrical engineering, both in 2007. After his postdoctoral training at the Audiovisual Communications Laboratory at Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, he joined Harvard University, where he is currently Gordon McKay Professor of Electrical Engineering and of Applied Mathematics at the John A. Paulson School of Engineering and Applied Sciences. He is also fortunate to have held visiting appointments at Duke University in 2016 and at the École Normale Supérieure (ENS) in 2019. His research interests include theoretical and algorithmic aspects of high-dimensional signal and information processing. He is an IEEE Signal Processing Society Distinguished Lecturer and a recipient of the ECE Illinois Young Alumni Achievement Award.

–

A full schedule for Spring 2022 Stochastics and Statistics Seminars can be found here: https://stat.mit.edu/seminars/upcoming/

Find out more »: Learning with Random Features and Kernels: Sharp Asymptotics and Universality Laws
Stein’s method for multivariate continuous distributions and applications
On September 11, 2020 at 11:00 am till 12:00 pm
Gesine Reinert, University of Oxford
online

Abstract: Stein’s method is a key method for assessing distributional distance, mainly for one-dimensional distributions. In this talk we provide a general approach to Stein’s method for multivariate continuous distributions. Among the applications we consider is the Wasserstein distance between two continuous probability distributions under the assumption of existence of a Poincare constant.

This is joint work with Guillaume Mijoule (INRIA Paris) and Yvik Swan (Liege).

–

Bio: Gesine Reinert is a Research Professor of the Department of Statistics and of Keble College, both University of Oxford. She is also a Fellow of the Alan Turing Institute and a Fellow of the IMS. Her research is centered around Stein’s method and analysis of networks and other complex structures. She currently chairs the Applied Probability Section of the Royal Statistical Society. She is the vice-chair of the European Cooperation for Statistics of Network Data Science.

Find out more »: Stein’s method for multivariate continuous distributions and applications
Exponential Error Bounds for Random Codes on the BSC
No dates for this event
David Forney (MIT LIDS)

One of Shannon’s earliest results was his determination of the capacity of the binary symmetric channel (BSC). Shannon went on to show that, with randomly chosen codes and optimal decoding, the probability of decoding error decreases exponentially for any transmission rate less than capacity. Much of the important early work of Shannon, Elias, Fano and Gallager was devoted to determining bounds on the corresponding “error exponent.” A later approach to this problem, pioneered by Csiszar and Korner, and now adopted in many modern information theory courses such as 6.441, determines error exponents using large-deviation theory. This approach has several advantages: (a) It can be remarkably simple and intuitive; (b) It gives correct error exponents, not just bounds; (c) It gives insight into how decoding errors typically occur. We illustrate this approach by developing the correct error exponents at all rates for both random codes and random linear codes on the BSC, assuming optimal decoding. We discuss how decoding errors typically occur. In particular, we show why the classical algebraic coding approach never had any hope of approaching channel capacity, even on the BSC. This talk is intended to be pitched at an elementary and tutorial level.

Find out more »: Exponential Error Bounds for Random Codes on the BSC
Influence maximization in stochastic and adversarial settings
On November 4, 2016 at 11:00 am till 12:00 pm
Po-Ling Loh (University of Pennsylvania)
E18-304

Abstract:
We consider the problem of influence maximization in fixed networks, for both stochastic and adversarial contagion models. In the stochastic setting, nodes are infected in waves according to linear threshold or independent cascade models. We establish upper and lower bounds for the influence of a subset of nodes in the network, where the influence is defined as the expected number of infected nodes at the conclusion of the epidemic. We quantify the gap between our upper and lower bounds in the case of the linear threshold model and illustrate the gains of our upper bounds for independent cascade models in relation to existing results. Importantly, our lower bounds are monotonic and submodular, implying that a greedy algorithm for influence maximization is guaranteed to produce a maximizer within a 1-1/e factor of the truth. In the adversarial setting, an adversary is allowed to specify the edges through which contagion may spread, and the player chooses sets of nodes to infect in successive rounds. We establish upper and lower bounds on the pseudo-regret for possibly stochastic strategies of the adversary and player. This is joint work with Justin Khim and Varun Jog.

Biography:
Po-Ling Loh is an assistant professor in the ECE department at the UW-Madison, with a secondary appointment in statistics, and an affiliate of the Grainger Institute and Wisconsin Institute for Discovery. From 2014-2016, Po-Ling was an assistant professor in the statistics department at the Wharton School at the University of Pennsylvania. Po-Ling received an MS in computer science and a PhD in statistics from Berkeley in 2013 and 2014, and a BS in math with a minor in English from Caltech in 2009. She was the recipient of the 2014 Erich L. Lehmann Citation from the Berkeley statistics department for an outstanding PhD dissertation in theoretical statistics, and a best student paper award at the NIPS conference in 2012.

Find out more »: Influence maximization in stochastic and adversarial settings
Quantile and Probability Curves without Crossing
On February 29, 2008 at 11:00 am till 12:00 pm
Victor Chernozhukov (MIT Econ)

The most common approach to estimating conditional quantile curves is to fit a curve, typically linear, pointwise for each quantile. Linear functional forms, coupled with pointwise fitting, are used for a number of reasons including parsimony of the resulting approximations and good computational properties. The resulting fits, however, may not respect a logical monotonicity requirement — that the quantile curve be increasing as a function of probability. This paper studies the natural monotonization of these empirical curves induced by sampling from the estimated non-monotone model, and then taking the resulting conditional quantile curves that by construction are monotone in the probability. This construction of monotone quantile curves may be seen as a bootstrap and also as a monotonic rearrangement of the original non-monotone function. It is shown that the monotonized curves are closer to the true curves in finite samples, for any sample size. Under correct specification, the rearranged conditional quantile curves have the same asymptotic distribution as the original non-monotone curves. Under misspecification, however, the asymptotics of the rearranged curves may partially differ from the asymptotics of the original non-monotone curves. An analogous procedure is developed to monotonize the estimates of conditional distribution functions. The results are derived by establishing the compact (Hadamard) differentiability of the monotonized quantile and probability curves with respect to the original curves in discontinuous directions, tangentially to a set of continuous functions. In doing so, the compact differentiability of the rearrangement-related operators is established.

Find out more »: Quantile and Probability Curves without Crossing
Fragility of Asymptotic Agreement under Bayesian Learning
On March 21, 2008 at 11:00 am till 12:00 pm
David Forney (MIT LIDS)

One of Shannon’s earliest results was his determination of the capacity of the binary symmetric channel (BSC). Shannon went on to show that, with randomly chosen codes and optimal decoding, the probability of decoding error decreases exponentially for any transmission rate less than capacity. Much of the important early work of Shannon, Elias, Fano and Gallager was devoted to determining bounds on the corresponding error exponent. A later approach to this problem, pioneered by Csiszar and Korner, and now adopted in many modern information theory courses such as 6.441, determines error exponents using large-deviation theory. This approach has several advantages: (a) It can be remarkably simple and intuitive; (b) It gives correct error exponents, not just bounds; (c) It gives insight into how decoding errors typically occur. We illustrate this approach by developing the correct error exponents at all rates for both random codes and random linear codes on the BSC, assuming optimal decoding. We discuss how decoding errors typically occur. In particular, we show why the classical algebraic coding approach never had any hope of approaching channel capacity, even on the BSC. This talk is intended to be pitched at an elementary and tutorial level.

Find out more »: Fragility of Asymptotic Agreement under Bayesian Learning