Unbiased Markov chain Monte Carlo with couplings

Pierre Jacob (Harvard)
E18-304

Abstract: Markov chain Monte Carlo methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, these estimators are generally biased after any fixed number of iterations, which complicates both parallel computation. In this talk I will explain how to remove this burn-in  bias by using couplings of Markov chains and a telescopic sum argument, inspired by Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and averaged. I will present…

Find out more »

On Learning Theory and Neural Networks

Amit Daniely (Google)
E18-304

Abstract:  Can learning theory, as we know it today, form a theoretical basis for neural networks. I will try to discuss this question in light of two new results -- one positive and one negative. Based on joint work with Roy Frostig, Vineet Gupta and Yoram Singer, and with Vitaly Feldman Biography: Amit Daniely is an Assistant Professor at the Hebrew University in Jerusalem, and a research scientist at Google Research, Tel-Aviv. Prior to that, he was a research scientist at Google Research, Mountain-View. Even…

Find out more »

Inference in dynamical systems and the geometry of learning group actions

Sayan Mukherjee (Duke)
E18-304

Abstract: We examine consistency of the Gibbs posterior for dynamical systems using a classical idea in dynamical systems called the thermodynamic formalism in tracking dynamical systems. We state a variation formulation under which there is a unique posterior distribution of parameters as well as hidden states using using classic ideas from dynamical systems such as pressure and joinings. We use an example of consistency of hidden Markov with infinite lags as an application of our theory. We develop a geometric framework that characterizes…

Find out more »

Structure in multi-index tensor data: a trivial byproduct of simpler phenomena?

John Cunningham (Columbia)
E18-304

Abstract:  As large tensor-variate data become increasingly common across applied machine learning and statistics, complex analysis methods for these data similarly increase in prevalence.  Such a trend offers the opportunity to understand subtler and more meaningful features of the data that, ostensibly, could not be studied with simpler datasets or simpler methodologies.  While promising, these advances are also perilous: novel analysis techniques do not always consider the possibility that their results are in fact an expected consequence of some simpler, already-known…

Find out more »

Additivity of Information in Deep Generative Networks: The I-MMSE Transform Method

Galen Reeves (Duke University)
E18-304

Abstract:  Deep generative networks are powerful probabilistic models that consist of multiple stages of linear transformations (described by matrices) and non-linear, possibly random, functions (described generally by information channels). These models have gained great popularity due to their ability to characterize complex probabilistic relationships arising in a wide variety of inference problems. In this talk, we introduce a new method for analyzing the fundamental limits of statistical inference in settings where the model is known. The validity of our method can…

Find out more »

Transport maps for Bayesian computation

Youssef Marzouk (MIT)
E18-304

Abstract: Integration against an intractable probability measure is among the fundamental challenges of Bayesian inference. A useful approach to this problem seeks a deterministic coupling of the measure of interest with a tractable "reference" measure (e.g., a standard Gaussian). This coupling is induced by a transport map, and enables direct simulation from the desired measure simply by evaluating the transport map at samples from the reference. Approximate transports can also be used to "precondition" standard Monte Carlo schemes. Yet characterizing a…

Find out more »

Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams

Jelani Nelson (Harvard University)
E18-304

Abstract: Consider the following problem: we monitor a sequence of edgeinsertions and deletions in a graph on n vertices, so there are N = (n choose 2) possible edges (e.g. monitoring a stream of friend accepts/removals on Facebook). At any point someone may say "query()", at which point must output a random edge that exists in the graph at that time from a distribution that is statistically close to uniform.  More specifically, with probability p our edge should come from a distribution close to uniform,…

Find out more »

Sample complexity of population recovery

Yury Polyanskiy (MIT)

Abstract: In this talk we will first consider a general question of estimating linear functional of the distribution based on the noisy samples from it. We discover that the (two-point) LeCam lower bound is in fact achievable by optimizing bias-variance tradeoff of an empirical-mean type of estimator. Next, we apply this general framework to the specific problem of population recovery. Namely, consider a random poll of sample size n conducted on a population of individuals, where each pollee is asked to…

Find out more »

New provable techniques for learning and inference in probabilistic graphical models

Andrej Risteski (Princeton University)

Abstract: A common theme in machine learning is succinct modeling of distributions over large domains. Probabilistic graphical models are one of the most expressive frameworks for doing this. The two major tasks involving graphical models are learning and inference. Learning is the task of calculating the "best fit" model parameters from raw data, while inference is the task of answering probabilistic queries for a model with known parameters (e.g. what is the marginal distribution of a subset of variables, after…

Find out more »

Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe

Vianney Perchet (ENS Paris-Saclay)
E18-304

Abstract: We consider the problem of bandit optimization, inspired by stochastic optimization and online learning with bandit feedback. In this problem, the objective is to minimize a global, not necessarily cumulative, convex loss function. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques ranging from bandits to convex optimization. We identify slow and fast of…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764