## November 2020

## Perfect Simulation for Feynman-Kac Models using Ensemble Rejection Sampling

Arnaud Doucet - University of Oxford

Abstract: I will introduce Ensemble Rejection Sampling, a scheme for perfect simulation of a class of Feynmac-Kac models. In particular, this scheme allows us to sample exactly from the posterior distribution of the latent states of a class of non-linear non-Gaussian state-space models and from the distribution of a class of conditioned random walks. Ensemble Rejection Sampling relies on a high-dimensional proposal distribution built using ensembles of state samples and dynamic programming. Although this algorithm can be interpreted as a…

Find out more »## December 2020

## A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Net

Rong Ge - Duke University

Abstract: The training of neural networks optimizes complex non-convex objective functions, yet in practice simple algorithms achieve great performances. Recent works suggest that over-parametrization could be a key ingredient in explaining this discrepancy. However, current theories could not fully explain the role of over-parameterization. In particular, they either work in a regime where neurons don't move much, or require large number of neurons. In this paper we develop a local convergence theory for mildly over-parameterized two-layer neural net. We show…

Find out more »## February 2021

## Faster and Simpler Algorithms for List Learning

Jerry Li, Microsoft Research

Abstract: The goal of list learning is to understand how to learn basic statistics of a dataset when it has been corrupted by an overwhelming fraction of outliers. More formally, one is given a set of points $S$, of which an $\alpha$-fraction $T$ are promised to be well-behaved. The goal is then to output an $O(1 / \alpha)$ sized list of candidate means, so that one of these candidates is close to the true mean of the points in $T$.…

Find out more »## Self-regularizing Property of Nonparametric Maximum Likelihood Estimator in Mixture Models

Yury Polyanskiy, MIT

Abstract: Introduced by Kiefer and Wolfowitz 1956, the nonparametric maximum likelihood estimator (NPMLE) is a widely used methodology for learning mixture models and empirical Bayes estimation. Sidestepping the non-convexity in mixture likelihood, the NPMLE estimates the mixing distribution by maximizing the total likelihood over the space of probability measures, which can be viewed as an extreme form of over parameterization. In this work we discover a surprising property of the NPMLE solution. Consider, for example, a Gaussian mixture model on…

Find out more »## March 2021

## Detection Thresholds for Distribution-Free Non-Parametric Tests: The Curious Case of Dimension 8

Bhaswar B. Bhattacharya, UPenn Wharton

Abstract: Two of the fundamental problems in non-parametric statistical inference are goodness-of-fit and two-sample testing. These two problems have been extensively studied and several multivariate tests have been proposed over the last thirty years, many of which are based on geometric graphs. These include, among several others, the celebrated Friedman-Rafsky two-sample test based on the minimal spanning tree and the K-nearest neighbor graphs, and the Bickel-Breiman spacings tests for goodness-of-fit. These tests are asymptotically distribution-free, universally consistent, and computationally efficient…

Find out more »## On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning

James Robins, Harvard

Abstract: For many causal effect parameters of interest, doubly robust machine learning (DRML) estimators ψ̂ 1 are the state-of-the-art, incorporating the good prediction performance of machine learning; the decreased bias of doubly robust estimators; and the analytic tractability and bias reduction of sample splitting with cross fitting. Nonetheless, even in the absence of confounding by unmeasured factors, the nominal (1−α) Wald confidence interval ψ̂ 1±zα/2ˆ may still undercover even in large samples, because the bias of ψ̂ 1 may be of the same…

Find out more »## Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via Root-Entropic Regularization

Daniel Roy, University of Toronto

Abstract: We consider sequential prediction with expert advice when data are generated from distributions varying arbitrarily within an unknown constraint set. We quantify relaxations of the classical i.i.d. assumption in terms of these constraint sets, with i.i.d. sequences at one extreme and adversarial mechanisms at the other. The Hedge algorithm, long known to be minimax optimal in the adversarial regime, was recently shown to be minimax optimal for i.i.d. data. We show that Hedge with deterministic learning rates is suboptimal…

Find out more »## Testing the I.I.D. assumption online

Vladimir Vovk, Royal Holloway, University of London

Abstract: Mainstream machine learning, despite its recent successes, has a serious drawback: while its state-of-the-art algorithms often produce excellent predictions, they do not provide measures of their accuracy and reliability that would be both practically useful and provably valid. Conformal prediction adapts rank tests, popular in nonparametric statistics, to testing the IID assumption (the observations being independent and identically distributed). This gives us practical measures, provably valid under the IID assumption, of the accuracy and reliability of predictions produced by…

Find out more »## April 2021

## Functions space view of linear multi-channel convolution networks with bounded weight norm

Suriya Gunasekar, Microsoft Research

Abstract: The magnitude of the weights of a neural network is a fundamental measure of complexity that plays a crucial role in the study of implicit and explicit regularization. For example, in recent work, gradient descent updates in overparameterized models asymptotically lead to solutions that implicitly minimize the ell_2 norm of the parameters of the model, resulting in an inductive bias that is highly architecture dependent. To investigate the properties of learned functions, it is natural to consider a function…

Find out more »## Sample Size Considerations in Precision Medicine

Eric Laber, Duke University

Abstract: Sequential Multiple Assignment Randomized Trials (SMARTs) are considered the gold standard for estimation and evaluation of treatment regimes. SMARTs are typically sized to ensure sufficient power for a simple comparison, e.g., the comparison of two fixed treatment sequences. Estimation of an optimal treatment regime is conducted as part of a secondary and hypothesis-generating analysis with formal evaluation of the estimated optimal regime deferred to a follow-up trial. However, running a follow-up trial to evaluate an estimated optimal treatment regime…

Find out more »