Statistics and Data Science Seminar


list_alt
  • Precise high-dimensional asymptotics for AdaBoost via max-margins & min-norm interpolants

    On November 19, 2021 at 11:00 am till 12:00 pm
    Pragya Sur (Harvard University)
    E18-304

    Abstract: This talk will introduce a precise high-dimensional asymptotic theory for AdaBoost on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is asymptotically separable. Under a class of statistical models, we will provide an (asymptotically) exact analysis of the max-min-L1-margin and the min-L1-norm interpolant. In turn, this will characterize the generalization error of AdaBoost, when the algorithm interpolates the training data and maximizes an empirical L1 margin. On the computational front, we will provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Our theory provides several insights into properties of AdaBoost; for instance, the larger the dimensionality ratio p/n, the faster the optimization reaches interpolation. Our statistical and computational arguments can handle (1) finite-rank spiked covariance models for the feature distribution and (2) variants of AdaBoost corresponding to general Lq-geometry, for q in [1,2]. This is based on joint work with Tengyuan Liang.

    ——————–

    Bio: Pragya Sur is an Assistant Professor in the Statistics Department at Harvard University. Her research broadly spans high-dimensional statistics, statistical machine learning, robust inference and prediction for multi-study/multi-environment heterogeneous data. She is simultaneously interested in applications of large scale statistical methods to computational neuroscience and genetics. Her research is currently supported by a William F. Milton Fund Award and an NSF DMS award. Previously, she was a postdoctoral fellow at the Center for Research on Computation and Society, Harvard John A. Paulson School of Engineering and Applied Sciences. She received a Ph.D. in Statistics from Stanford University in 2019, where she received the Theodore W. Anderson Theory of Statistics Dissertation Award and a Ric Weiland Graduate Fellowship in the Humanities and Sciences.

    Find out more »: Precise high-dimensional asymptotics for AdaBoost via max-margins & min-norm interpolants
  • Causal Representation Learning – A Proposal

    On April 16, 2022 at 11:00 am till 12:00 pm
    Caroline Uhler, MIT
    E18-304

    Abstract: The development of CRISPR-based assays and small molecule screens holds the promise of engineering precise cell state transitions to move cells from one cell type to another or from a diseased state to a healthy state. The main bottleneck is the huge space of possible perturbations/interventions, where even with the breathtaking technological advances in single-cell biology it will never be possible to experimentally perturb all combinations of thousands of genes or compounds. This important biological problem calls for a framework that can integrate data from different modalities to identify causal representations, predict the effect of unseen interventions, and identify the optimal interventions to induce precise cell state transition. Traditional representation learning methods, although often highly successful in predictive tasks, do not generally elucidate causal relationships. In this talk, we will present initial ideas towards building a statistical and computational framework for causal representation learning and its application towards optimal intervention design.

    Bio: Caroline Uhler is the Henry L. and Grace Doherty associate professor in EECS and IDSS, a member of SDSC, LIDS and the ORC, Machine Learning at MIT, and is also core member of the Broad Institute, where she co-directs the Eric and Wendy Schmidt Center. She is an elected member of the International Statistical Institute and the recipient of a Simons Investigator Award, a Sloan Research Fellowship, an NSF Career Award, a Sofja Kovalevskaja Award from the Humboldt Foundation, and a START Award from the Austrian Science Fund.

    Find out more »: Causal Representation Learning – A Proposal
  • Learning with Random Features and Kernels: Sharp Asymptotics and Universality Laws

    On April 22, 2022 at 11:00 am till 12:00 pm
    Yue M. Lu, Harvard University
    E18-304

    Abstract: Many new random matrix ensembles arise in learning and modern signal processing. As shown in recent studies, the spectral properties of these matrices help answer crucial questions regarding the training and generalization performance of neural networks, and the fundamental limits of high-dimensional signal recovery. As a result, there has been growing interest in precisely understanding the spectra and other asymptotic properties of these matrices. Unlike their classical counterparts, these new random matrices are often highly structured and are the result of nonlinear transformations. This combination of structure and nonlinearity leads to substantial technical challenges when applying existing tools from random matrix theory to these new random matrix ensembles. In this talk, we will consider learning by random feature models and the related problem of kernel ridge regression. In each case, a nonlinear random matrix plays a prominent role. We provide an exact characterization of the asymptotic training and generalization errors of these models. These results reveal the important roles played by the regularization, the loss function and the activation function in the mitigation of the double descent phenomenon in learning. The asymptotic analysis is made possible by a general universality theorem, which establishes the asymptotic equivalence between the nonlinear random matrices and a surrogate linear random matrix ensemble that is much easier to work with.

    Bio: Yue M. Lu attended the University of Illinois at Urbana-Champaign, where he received the M.Sc. degree in mathematics and the Ph.D. degree in electrical engineering, both in 2007. After his postdoctoral training at the Audiovisual Communications Laboratory at Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, he joined Harvard University, where he is currently Gordon McKay Professor of Electrical Engineering and of Applied Mathematics at the John A. Paulson School of Engineering and Applied Sciences. He is also fortunate to have held visiting appointments at Duke University in 2016 and at the École Normale Supérieure (ENS) in 2019. His research interests include theoretical and algorithmic aspects of high-dimensional signal and information processing. He is an IEEE Signal Processing Society Distinguished Lecturer and a recipient of the ECE Illinois Young Alumni Achievement Award.

    A full schedule for Spring 2022 Stochastics and Statistics Seminars can be found here: https://stat.mit.edu/seminars/upcoming/

    Find out more »: Learning with Random Features and Kernels: Sharp Asymptotics and Universality Laws
  • Stein’s method for multivariate continuous distributions and applications

    On September 11, 2020 at 11:00 am till 12:00 pm
    Gesine Reinert, University of Oxford
    online

    Abstract: Stein’s method is a key method for assessing distributional distance, mainly for one-dimensional distributions. In this talk we provide a general approach to Stein’s method for multivariate continuous distributions. Among the applications we consider is the Wasserstein distance between two continuous probability distributions under the assumption of existence of a Poincare constant.

    This is joint work with Guillaume Mijoule (INRIA Paris) and Yvik Swan (Liege).

    Bio: Gesine Reinert is a Research Professor of the Department of Statistics and of Keble College, both University of Oxford. She is also a Fellow of the Alan Turing Institute and a Fellow of the IMS. Her research is centered around Stein’s method and analysis of networks and other complex structures. She currently chairs the Applied Probability Section of the Royal Statistical Society. She is the vice-chair of the European Cooperation for Statistics of Network Data Science.

    Find out more »: Stein’s method for multivariate continuous distributions and applications
  • Exponential Error Bounds for Random Codes on the BSC

    No dates for this event
    David Forney (MIT LIDS)

    One of Shannon’s earliest results was his determination of the capacity of the binary symmetric channel (BSC). Shannon went on to show that, with randomly chosen codes and optimal decoding, the probability of decoding error decreases exponentially for any transmission rate less than capacity. Much of the important early work of Shannon, Elias, Fano and Gallager was devoted to determining bounds on the corresponding “error exponent.” A later approach to this problem, pioneered by Csiszar and Korner, and now adopted in many modern information theory courses such as 6.441, determines error exponents using large-deviation theory. This approach has several advantages: (a) It can be remarkably simple and intuitive; (b) It gives correct error exponents, not just bounds; (c) It gives insight into how decoding errors typically occur. We illustrate this approach by developing the correct error exponents at all rates for both random codes and random linear codes on the BSC, assuming optimal decoding. We discuss how decoding errors typically occur. In particular, we show why the classical algebraic coding approach never had any hope of approaching channel capacity, even on the BSC. This talk is intended to be pitched at an elementary and tutorial level.

    Find out more »: Exponential Error Bounds for Random Codes on the BSC
  • Influence maximization in stochastic and adversarial settings

    On November 4, 2016 at 11:00 am till 12:00 pm
    Po-Ling Loh (University of Pennsylvania)
    E18-304

    Abstract:
    We consider the problem of influence maximization in fixed networks, for both stochastic and adversarial contagion models. In the stochastic setting, nodes are infected in waves according to linear threshold or independent cascade models. We establish upper and lower bounds for the influence of a subset of nodes in the network, where the influence is defined as the expected number of infected nodes at the conclusion of the epidemic. We quantify the gap between our upper and lower bounds in the case of the linear threshold model and illustrate the gains of our upper bounds for independent cascade models in relation to existing results. Importantly, our lower bounds are monotonic and submodular, implying that a greedy algorithm for influence maximization is guaranteed to produce a maximizer within a 1-1/e factor of the truth. In the adversarial setting, an adversary is allowed to specify the edges through which contagion may spread, and the player chooses sets of nodes to infect in successive rounds. We establish upper and lower bounds on the pseudo-regret for possibly stochastic strategies of the adversary and player. This is joint work with Justin Khim and Varun Jog.

    Biography:
    Po-Ling Loh is an assistant professor in the ECE department at the UW-Madison, with a secondary appointment in statistics, and an affiliate of the Grainger Institute and Wisconsin Institute for Discovery. From 2014-2016, Po-Ling was an assistant professor in the statistics department at the Wharton School at the University of Pennsylvania. Po-Ling received an MS in computer science and a PhD in statistics from Berkeley in 2013 and 2014, and a BS in math with a minor in English from Caltech in 2009. She was the recipient of the 2014 Erich L. Lehmann Citation from the Berkeley statistics department for an outstanding PhD dissertation in theoretical statistics, and a best student paper award at the NIPS conference in 2012.

    Find out more »: Influence maximization in stochastic and adversarial settings
  • Quantile and Probability Curves without Crossing

    On February 29, 2008 at 11:00 am till 12:00 pm
    Victor Chernozhukov (MIT Econ)

    The most common approach to estimating conditional quantile curves is to fit a curve, typically linear, pointwise for each quantile. Linear functional forms, coupled with pointwise fitting, are used for a number of reasons including parsimony of the resulting approximations and good computational properties. The resulting fits, however, may not respect a logical monotonicity requirement — that the quantile curve be increasing as a function of probability. This paper studies the natural monotonization of these empirical curves induced by sampling from the estimated non-monotone model, and then taking the resulting conditional quantile curves that by construction are monotone in the probability. This construction of monotone quantile curves may be seen as a bootstrap and also as a monotonic rearrangement of the original non-monotone function. It is shown that the monotonized curves are closer to the true curves in finite samples, for any sample size. Under correct specification, the rearranged conditional quantile curves have the same asymptotic distribution as the original non-monotone curves. Under misspecification, however, the asymptotics of the rearranged curves may partially differ from the asymptotics of the original non-monotone curves. An analogous procedure is developed to monotonize the estimates of conditional distribution functions. The results are derived by establishing the compact (Hadamard) differentiability of the monotonized quantile and probability curves with respect to the original curves in discontinuous directions, tangentially to a set of continuous functions. In doing so, the compact differentiability of the rearrangement-related operators is established.

    Find out more »: Quantile and Probability Curves without Crossing
  • Fragility of Asymptotic Agreement under Bayesian Learning

    On March 21, 2008 at 11:00 am till 12:00 pm
    David Forney (MIT LIDS)

    One of Shannon’s earliest results was his determination of the capacity of the binary symmetric channel (BSC). Shannon went on to show that, with randomly chosen codes and optimal decoding, the probability of decoding error decreases exponentially for any transmission rate less than capacity. Much of the important early work of Shannon, Elias, Fano and Gallager was devoted to determining bounds on the corresponding error exponent. A later approach to this problem, pioneered by Csiszar and Korner, and now adopted in many modern information theory courses such as 6.441, determines error exponents using large-deviation theory. This approach has several advantages: (a) It can be remarkably simple and intuitive; (b) It gives correct error exponents, not just bounds; (c) It gives insight into how decoding errors typically occur. We illustrate this approach by developing the correct error exponents at all rates for both random codes and random linear codes on the BSC, assuming optimal decoding. We discuss how decoding errors typically occur. In particular, we show why the classical algebraic coding approach never had any hope of approaching channel capacity, even on the BSC. This talk is intended to be pitched at an elementary and tutorial level.

    Find out more »: Fragility of Asymptotic Agreement under Bayesian Learning
  • Exponential Error Bounds for Random Codes on the BSC

    On November 16, 2007 at 11:00 am till 12:00 pm
    David Forney (MIT LIDS)

    Abstract:

    One of Shannon’s earliest results was his determination of the capacity of the binary symmetric channel (BSC). Shannon went on to show that, with randomly chosen codes and optimal decoding, the probability of decoding error decreases exponentially for any transmission rate less than capacity. Much of the important early work of Shannon, Elias, Fano and Gallager was devoted to determining bounds on the corresponding error exponent. A later approach to this problem, pioneered by Csiszar and Korner, and now adopted in many modern information theory courses such as 6.441, determines error exponents using large-deviation theory. This approach has several advantages: (a) It can be remarkably simple and intuitive; (b) It gives correct error exponents, not just bounds; (c) It gives insight into how decoding errors typically occur. We illustrate this approach by developing the correct error exponents at all rates for both random codes and random linear codes on the BSC, assuming optimal decoding. We discuss how decoding errors typically occur. In particular, we show why the classical algebraic coding approach never had any hope of approaching channel capacity, even on the BSC. This talk is intended to be pitched at an elementary and tutorial level.

    Find out more »: Exponential Error Bounds for Random Codes on the BSC
  • Fragility of Asymptotic Agreement under Bayesian Learning

    No dates for this event
    Daron Acemoglu (MIT Econ)

    Under the assumption that individuals know the conditional distributions of signals given the payoff-relevant parameters, existing results conclude that, as individuals observe infinitely many signals, their beliefs about the parameters will eventually merge. We first show that these results are fragile when individuals are uncertain about the signal distributions: given any such model, a vanishingly small individual uncertainty about the signal distributions can lead to a substantial (non-vanishing) amount of differences between the asymptotic beliefs. We then characterize the conditions under which a small amount of uncertainty leads only to a small amount of asymptotic disagreement. According to our characterization, this is the case if the uncertainty about the signal distributions is generated by a family with rapidly-varying tails (such as the normal or the exponential distributions). However, when this family has regularly-varying tails (such as the Pareto, the log-normal, and the t-distributions), a small amount of uncertainty leads to a substantial amount of asymptotic disagreement

    Find out more »: Fragility of Asymptotic Agreement under Bayesian Learning