Stochastics and Statistics Seminar

Views Navigation

Event Views Navigation

Gaussian Differential Privacy, with Applications to Deep Learning

Weijie Su (University of Pennsylvania)
E18-304

Abstract:   Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on the Renyi divergences. We propose an alternative relaxation we term "f-DP", which has a number of nice properties and avoids some of the difficulties associated with divergence based relaxations. First, f-DP preserves…

Find out more »

Diffusion K-means Clustering on Manifolds: provable exact recovery via semidefinite relaxations

Xiaohui Chen (University of Illinois at Urbana-Champaign)
E18-304

Abstract: We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. Thus the diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given…

Find out more »

Predictive Inference with the Jackknife+

Rina Foygel Barber (University of Chicago)
E18-304

Abstract: We introduce the jackknife+, a novel method for constructing predictive confidence intervals that is robust to the distribution of the data. The jackknife+ modifies the well-known jackknife (leaveoneout cross-validation) to account for the variability in the fitted regression function when we subsample the training data. Assuming exchangeable training samples, we prove that the jackknife+ permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically (in contrast, such guarantees…

Find out more »

Tales of Random Projections

Kavita Ramanan (Brown University)
E18-304

Abstract: Properties of random projections of high-dimensional probability measures are of interest in a variety of fields, including asymptotic convex geometry, and potential applications to high-dimensional statistics and data analysis.   A particular question of interest is to identify what properties of the high-dimensional measure are captured by its lower-dimensional projections.   While fluctuations of these projections have been well studied over the past decade, we describe more recent work on the tail behavior of such projections, and various implications.  This talk is based on…

Find out more »

Matrix Concentration for Products

Jonathan Niles-Weed (New York University)
online

Abstract: We develop nonasymptotic concentration bounds for products of independent random matrices. Such products arise in the study of stochastic algorithms, linear dynamical systems, and random walks on groups. Our bounds exactly match those available for scalar random variables and continue the program, initiated by Ahlswede-Winter and Tropp, of extending familiar concentration bounds to the noncommutative setting. Our proof technique relies on geometric properties of the Schatten trace class. Joint work with D. Huang, J. A. Tropp, and R. Ward.…

Find out more »

On Using Graph Distances to Estimate Euclidean and Related Distances

Ery Arias-Castro (University of California, San Diego)
online

Abstract:  Graph distances have proven quite useful in machine learning/statistics, particularly in the estimation of Euclidean or geodesic distances. The talk will include a partial review of the literature, and then present more recent developments on the estimation of curvature-constrained distances on a surface, as well as on the estimation of Euclidean distances based on an unweighted and noisy neighborhood graph. - About the Speaker:  Ery Arias-Castro received his Ph.D. in Statistics from Stanford University in 2004. He then took…

Find out more »

How to Trap a Gradient Flow

Sébastien Bubeck (Microsoft Research)
online

Abstract: In 1993, Stephen A. Vavasis proved that in any finite dimension, there exists a faster method than gradient descent to find stationary points of smooth non-convex functions. In dimension 2 he proved that 1/eps gradient queries are enough, and that 1/sqrt(eps) queries are necessary. We close this gap by providing an algorithm based on a new local-to-global phenomenon for smooth non-convex functions. Some higher dimensional results will also be discussed. I will also present an extension of the 1/sqrt(eps)…

Find out more »

Stein’s method for multivariate continuous distributions and applications

Gesine Reinert, University of Oxford
online

Abstract: Stein’s method is a key method for assessing distributional distance, mainly for one-dimensional distributions. In this talk we provide a general approach to Stein’s method for multivariate continuous distributions. Among the applications we consider is the Wasserstein distance between two continuous probability distributions under the assumption of existence of a Poincare constant. This is joint work with Guillaume Mijoule (INRIA Paris) and Yvik Swan (Liege). - Bio: Gesine Reinert is a Research Professor of the Department of Statistics and…

Find out more »

Causal Inference and Overparameterized Autoencoders in the Light of Drug Repurposing for SARS-CoV-2

Caroline Uhler, MIT
online

Abstract:  Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (drugs, knockouts, overexpression, etc.) in biology. In order to obtain mechanistic insights from such data, a major challenge is the development of a framework that integrates observational and interventional data and allows predicting the effect of yet unseen interventions or transporting the effect of interventions…

Find out more »

Separating Estimation from Decision Making in Contextual Bandits

Dylan Foster, MIT
online

Abstract: The contextual bandit is a sequential decision making problem in which a learner repeatedly selects an action (e.g., a news article to display) in response to a context (e.g., a user’s profile) and receives a reward, but only for the action they selected. Beyond the classic explore-exploit tradeoff, a fundamental challenge in contextual bandits is to develop algorithms that can leverage flexible function approximation to model similarity between contexts, yet have computational requirements comparable to classical supervised learning tasks…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764