## February 2017

## Non-classical Berry-Esseen inequality and accuracy of the weighted bootstrap.

Mayya Zhilova (Georgia Tech)

Abstract: In this talk, we will study higher-order accuracy of the weighted bootstrap procedure for estimation of a distribution of a sum of independent random vectors with bounded fourth moments, on the set of all Euclidean balls. Our approach is based on Berry-Esseen type inequality which extends the classical normal approximation bound. These results justify in non-asymptotic setting that the weighted bootstrap can outperform Gaussian (or chi-squared) approximation in accuracy w.r.t. dimension and sample size. In addition, the presented results lead…

Find out more »## Slope meets Lasso in sparse linear regression

Pierre Bellec (Rutgers)

Abstract: We will present results in sparse linear regression on two convex regularized estimators, the Lasso and the recently introduced Slope estimator, in the high-dimensional setting where the number of covariates p is larger than the number of observations n. The estimation and prediction performance of these estimators will be presented, as well as a comparative study of the assumptions on the design matrix. https://arxiv.org/pdf/1605.08651.pdf Biography: I am an Assistant Professor of statistics at Rutgers, the State University of New Jersey. I obtained my PhD…

Find out more »## Causal Discovery in Systems with Feedback Cycles

Frederick Eberhardt (CalTech)

Abstract: While causal relations are generally considered to be anti-symmetric, we often find that over time there are feedback systems such that a variable can have a causal effect on itself. Such "cyclic" causal systems pose significant challenges for causal analysis, both in terms of the appropriate representation of the system under investigation, and for the development of algorithms that attempt to infer as much as possible about the underlying causal system from statistical data. This talk will aim to provide some theoretical insights about…

Find out more »## Estimating the number of connected components of large graphs based on subgraph sampling

Yihong Wu (Yale)

Abstract: Learning properties of large graphs from samples is an important problem in statistical network analysis, dating back to the early work of Goodman and Frank. We revisit the problem formulated by Frank (1978) of estimating the numbers of connected components in a graph of N vertices based on the subgraph sampling model, where we observe the subgraph induced by n vertices drawn uniformly at random. The key question is whether it is possible to achieve accurate estimation, i.e., vanishing normalized mean-square error,…

Find out more »## March 2017

## Computing partition functions by interpolation

Alexander Barvinok (University of Michigan)

Abstract: Partition functions are just multivariate polynomials with great many monomials enumerating combinatorial structures of a particular type and their efficient computation (approximation) are of interest for combinatorics, statistics, physics and computational complexity. I’ll present a general principle: the partition function can be efficiently approximated in a domain if it has no complex zeros in a slightly larger domain, and illustrate it on the examples of the permanent of a matrix, the independence polynomial of a graph and, time permitting, the graph homomorphism partition…

Find out more »## Jagers-Nerman stable age distribution theory, change point detection and power of two choices in evolving networks

Shankar Bhamidi (UNC)

Abstract: (i) Change point detection for networks: We consider the preferential attachment model. We formulate and study the regime where the network transitions from one evolutionary scheme to another. In the large network limit we derive asymptotics for various functionals of the network including degree distribution and maximal degree. We study functional central limit theorems for the evolution of the degree distribution which feed into proving consistency of a proposed estimator of the change point. (ii) Power of choice and network…

Find out more »## September 2017

## New provable techniques for learning and inference in probabilistic graphical models

Andrej Risteski (Princeton University)

Abstract: A common theme in machine learning is succinct modeling of distributions over large domains. Probabilistic graphical models are one of the most expressive frameworks for doing this. The two major tasks involving graphical models are learning and inference. Learning is the task of calculating the "best fit" model parameters from raw data, while inference is the task of answering probabilistic queries for a model with known parameters (e.g. what is the marginal distribution of a subset of variables, after…

Find out more »## Sample complexity of population recovery

Yury Polyanskiy (MIT)

Abstract: In this talk we will first consider a general question of estimating linear functional of the distribution based on the noisy samples from it. We discover that the (two-point) LeCam lower bound is in fact achievable by optimizing bias-variance tradeoff of an empirical-mean type of estimator. Next, we apply this general framework to the specific problem of population recovery. Namely, consider a random poll of sample size n conducted on a population of individuals, where each pollee is asked to…

Find out more »## Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams

Jelani Nelson (Harvard University)

Abstract: Consider the following problem: we monitor a sequence of edgeinsertions and deletions in a graph on n vertices, so there are N = (n choose 2) possible edges (e.g. monitoring a stream of friend accepts/removals on Facebook). At any point someone may say "query()", at which point must output a random edge that exists in the graph at that time from a distribution that is statistically close to uniform. More specifically, with probability p our edge should come from a distribution close to uniform,…

Find out more »## October 2017

## Transport maps for Bayesian computation

Youssef Marzouk (MIT)

Abstract: Integration against an intractable probability measure is among the fundamental challenges of Bayesian inference. A useful approach to this problem seeks a deterministic coupling of the measure of interest with a tractable "reference" measure (e.g., a standard Gaussian). This coupling is induced by a transport map, and enables direct simulation from the desired measure simply by evaluating the transport map at samples from the reference. Approximate transports can also be used to "precondition" standard Monte Carlo schemes. Yet characterizing a…

Find out more »## Additivity of Information in Deep Generative Networks: The I-MMSE Transform Method

Galen Reeves (Duke University)

Abstract: Deep generative networks are powerful probabilistic models that consist of multiple stages of linear transformations (described by matrices) and non-linear, possibly random, functions (described generally by information channels). These models have gained great popularity due to their ability to characterize complex probabilistic relationships arising in a wide variety of inference problems. In this talk, we introduce a new method for analyzing the fundamental limits of statistical inference in settings where the model is known. The validity of our method can…

Find out more »## Structure in multi-index tensor data: a trivial byproduct of simpler phenomena?

John Cunningham (Columbia)

Abstract: As large tensor-variate data become increasingly common across applied machine learning and statistics, complex analysis methods for these data similarly increase in prevalence. Such a trend offers the opportunity to understand subtler and more meaningful features of the data that, ostensibly, could not be studied with simpler datasets or simpler methodologies. While promising, these advances are also perilous: novel analysis techniques do not always consider the possibility that their results are in fact an expected consequence of some simpler, already-known…

Find out more »## Inference in dynamical systems and the geometry of learning group actions

Sayan Mukherjee (Duke)

Abstract: We examine consistency of the Gibbs posterior for dynamical systems using a classical idea in dynamical systems called the thermodynamic formalism in tracking dynamical systems. We state a variation formulation under which there is a unique posterior distribution of parameters as well as hidden states using using classic ideas from dynamical systems such as pressure and joinings. We use an example of consistency of hidden Markov with infinite lags as an application of our theory. We develop a geometric framework that characterizes…

Find out more »## On Learning Theory and Neural Networks

Amit Daniely (Google)

Abstract: Can learning theory, as we know it today, form a theoretical basis for neural networks. I will try to discuss this question in light of two new results -- one positive and one negative. Based on joint work with Roy Frostig, Vineet Gupta and Yoram Singer, and with Vitaly Feldman Biography: Amit Daniely is an Assistant Professor at the Hebrew University in Jerusalem, and a research scientist at Google Research, Tel-Aviv. Prior to that, he was a research scientist at Google Research, Mountain-View. Even…

Find out more »## November 2017

## Unbiased Markov chain Monte Carlo with couplings

Pierre Jacob (Harvard)

Abstract: Markov chain Monte Carlo methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, these estimators are generally biased after any fixed number of iterations, which complicates both parallel computation. In this talk I will explain how to remove this burn-in bias by using couplings of Markov chains and a telescopic sum argument, inspired by Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and averaged. I will present…

Find out more »## Statistics, Computation and Learning with Graph Neural Networks

Joan Bruna Estrach (NYU)

Abstract: Deep Learning, thanks mostly to Convolutional architectures, has recently transformed computer vision and speech recognition. Their ability to encode geometric stability priors, while offering enough expressive power, is at the core of their success. In such settings, geometric stability is expressed in terms of local deformations, and it is enforced thanks to localized convolutional operators that separate the estimation into scales. Many problems across applied sciences, from particle physics to recommender systems, are formulated in terms of signals defined over…

Find out more »## Generative Models and Compressed Sensing

Alex Dimakis (University of Texas at Austin)

Abstract: The goal of compressed sensing is to estimate a vector from an under-determined system of noisy linear measurements, by making use of prior knowledge in the relevant domain. For most results in the literature, the structure is represented by sparsity in a well-chosen basis. We show how to achieve guarantees similar to standard compressed sensing but without employing sparsity at all. Instead, we assume that the unknown vectors lie near the range of a generative model, e.g. a GAN…

Find out more »## December 2017

## Challenges in Developing Learning Algorithms to Personalize Treatment in Real Time

Susan Murphy (Harvard)

Abstract: A formidable challenge in designing sequential treatments is to determine when and in which context it is best to deliver treatments. Consider treatment for individuals struggling with chronic health conditions. Operationally designing the sequential treatments involves the construction of decision rules that input current context of an individual and output a recommended treatment. That is, the treatment is adapted to the individual's context; the context may include current health status, current level of social support and current level of adherence…

Find out more »## Stochastics and Statistics Seminar

Alex Bloemendal (Broad Institute)

Biography: Alex Bloemendal is a computational scientist at the Broad Institute of MIT and Harvard and at the Analytic and Translational Genetics Unit of Massachusetts General Hospital. As a member of Broad institute member Ben Neale’s lab, Bloemendal leads a group in developing new methods to analyze genetic data, harnessing its unprecedented scope and scale to discover the genetic causes of disease. He also co-founded and directs the Models, Inference & Algorithms initiative at the Broad, bridging computational biology, mathematical…

Find out more »## February 2018

## Connections between structured estimation and weak submodularity

Sahand Negahban (Yale University)

Abstract: Many modern statistical estimation problems rely on imposing additional structure in order to reduce the statistical complexity and provide interpretability. Unfortunately, these structures often are combinatorial in nature and result in computationally challenging problems. In parallel, the combinatorial optimization community has placed significant effort in developing algorithms that can approximately solve such optimization problems in a computationally efficient manner. The focus of this talk is to expand upon ideas that arise in combinatorial optimization and connect those algorithms and…

Find out more »## Variable selection using presence-only data with applications to biochemistry

Garvesh Raskutti (University of Wisconsin)

Abstract: In a number of problems, we are presented with positive and unlabelled data, referred to as presence-only responses. The application I present today involves studying the relationship between protein sequence and function and presence-only data arises since for many experiments it is impossible to obtain a large set of negative (non-functional) sequences. Furthermore, if the number of variables is large and the goal is variable selection (as in this case), a number of statistical and computational challenges arise due…

Find out more »## User-friendly guarantees for the Langevin Monte Carlo

Arnak Dalalyan (ENSAE-CREST)

Abstract: In this talk, I will revisit the recently established theoretical guarantees for the convergence of the Langevin Monte Carlo algorithm of sampling from a smooth and (strongly) log-concave density. I will discuss the existing results when the accuracy of sampling is measured in the Wasserstein distance and provide further insights on relations between, on the one hand, the Langevin Monte Carlo for sampling and, on the other hand, the gradient descent for optimization. I will also present non-asymptotic guarantees for the accuracy…

Find out more »## Optimization’s Implicit Gift to Learning: Understanding Optimization Bias as a Key to Generalization

Nathan Srebro-Bartom (TTI-Chicago)

Abstract: It is becoming increasingly clear that implicit regularization afforded by the optimization algorithms play a central role in machine learning, and especially so when using large, deep, neural networks. We have a good understanding of the implicit regularization afforded by stochastic approximation algorithms, such as SGD, and as I will review, we understand and can characterize the implicit bias of different algorithms, and can design algorithms with specific biases. But in this talk I will focus on implicit biases of…

Find out more »## March 2018

## One and two sided composite-composite tests in Gaussian mixture models

Alexandra Carpentier (Otto von Guericke Universitaet)

Abstract: Finding an efficient test for a testing problem is often linked to the problem of estimating a given function of the data. When this function is not smooth, it is necessary to approximate it cleverly in order to build good tests. In this talk, we will discuss two specific testing problems in Gaussian mixtures models. In both, the aim is to test the proportion of null means. The aforementioned link between sharp approximation rates of non-smooth objects and minimax testing…

Find out more »## Statistical estimation under group actions: The Sample Complexity of Multi-Reference Alignment

Afonso Bandeira (NYU)

Abstract: : Many problems in signal/image processing, and computer vision amount to estimating a signal, image, or tri-dimensional structure/scene from corrupted measurements. A particularly challenging form of measurement corruption are latent transformations of the underlying signal to be recovered. Many such transformations can be described as a group acting on the object to be recovered. Examples include the Simulatenous Localization and Mapping (SLaM) problem in Robotics and Computer Vision, where pictures of a scene are obtained from different positions andorientations;…

Find out more »## When Inference is tractable

David Sontag (MIT)

Abstract: A key capability of artificial intelligence will be the ability to reason about abstract concepts and draw inferences. Where data is limited, probabilistic inference in graphical models provides a powerful framework for performing such reasoning, and can even be used as modules within deep architectures. But, when is probabilistic inference computationally tractable? I will present recent theoretical results that substantially broaden the class of provably tractable models by exploiting model stability (Lang, Sontag, Vijayaraghavan, AI Stats ’18), structure in…

Find out more »## Statistical theory for deep neural networks with ReLU activation function

Johannes Schmidt-Hieber (Leiden)

Abstract: The universal approximation theorem states that neural networks are capable of approximating any continuous function up to a small error that depends on the size of the network. The expressive power of a network does, however, not guarantee that deep networks perform well on data. For that, control of the statistical estimation risk is needed. In the talk, we derive statistical theory for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown…

Find out more »## April 2018

## Optimality of Spectral Methods for Ranking, Community Detections and Beyond

Jianqing Fan (Princeton University)

Abstract: Spectral methods have been widely used for a large class of challenging problems, ranging from top-K ranking via pairwise comparisons, community detection, factor analysis, among others. Analyses of these spectral methods require super-norm perturbation analysis of top eigenvectors. This allows us to UNIFORMLY approximate elements in eigenvectors by linear functions of the observed random matrix that can be analyzed further. We first establish such an infinity-norm pertubation bound for top eigenvectors and apply the idea to several challenging problems…

Find out more »## Testing degree corrections in Stochastic Block Models

Subhabrata Sen (Microsoft)

Abstract: The community detection problem has attracted significant attention in re- cent years, and it has been studied extensively under the framework of a Stochas- tic Block Model (SBM). However, it is well-known that SBMs t real data very poorly, and various extensions have been suggested to replicate characteristics of real data. The recovered community assignments are often sensitive to the model used, and this naturally begs the following question: Given a network with community structure, how to decide whether…

Find out more »## Inference, Computation, and Visualization for Convex Clustering and Biclustering

Genevera Allen (Rice)

Abstract: Hierarchical clustering enjoys wide popularity because of its fast computation, ease of interpretation, and appealing visualizations via the dendogram and cluster heatmap. Recently, several have proposed and studied convex clustering and biclustering which, similar in spirit to hierarchical clustering, achieve cluster merges via convex fusion penalties. While these techniques enjoy superior statistical performance, they suffer from slower computation and are not generally conducive to representation as a dendogram. In the first part of the talk, we present new convex…

Find out more »## May 2018

## Size-Independent Sample Complexity of Neural Networks

Ohad Shamir (Weizman Institute)

Abstract: I'll describe new bounds on the sample complexity of deep neural networks, based on the norms of the parameter matrices at each layer. In particular, we show how certain norms lead to the first explicit bounds which are fully independent of the network size (both depth and width), and are therefore applicable to arbitrarily large neural networks. These results are derived using some novel techniques, which may be of independent interest. Joint work with Noah Golowich (Harvard) and Alexander…

Find out more »## Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions

Adel Javanmard (USC)

Abstract: Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers’ valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers’ valuations, i.e., buyers’ preferences. The seller’s goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is…

Find out more »## Fitting a putative manifold to noisy data

Hariharan Narayanan (Tata Institute of Fundamental Research, Mumbai)

Abstract: We give a solution to the following question from manifold learning. Suppose data belonging to a high dimensional Euclidean space is drawn independently, identically distributed from a measure supported on a low dimensional twice differentiable embedded compact manifold M, and is corrupted by a small amount of i.i.d gaussian noise. How can we produce a manifold M whose Hausdorff distance to M is small and whose reach (normal injectivity radius) is not much smaller than the reach of M? This…

Find out more »## August 2018

## Resource-efficient ML in 2 KB RAM for the Internet of Things

Prateek Jain (Microsoft Research)

Abstract: We propose an alternative paradigm for the Internet of Things (IoT) where machine learning algorithms run locally on severely resource-constrained edge and endpoint devices without necessarily needing cloud connectivity. This enables many scenarios beyond the pale of the traditional paradigm including low-latency brain implants, precision agriculture on disconnected farms, privacy-preserving smart spectacles, etc. Towards this end, we develop novel tree and kNN based algorithm, called Bonsai and ProtoNN, for efficient prediction on IoT devices -- such as those based…

Find out more »## September 2018

## Variational problems on random structures and their continuum limits

Dejan Slepčev (Carnegie Mellon University)

Abstract: We will discuss variational problems arising in machine learning and their limits as the number of data points goes to infinity. Consider point clouds obtained as random samples of an underlying "ground-truth" measure. Graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points. Many machine learning tasks, such as clustering and semi-supervised learning, can be posed as minimizing functionals on such graphs. We consider functionals involving graph cuts, graph laplacians…

Find out more »## Reverse hypercontractivity beats measure concentration for information theoretic converses

Jingbo Liu (MIT)

Abstract: Concentration of measure refers to a collection of tools and results from analysis and probability theory that have been used in many areas of pure and applied mathematics. Arguably, the first data science application of measure concentration (under the name ‘‘blowing-up lemma’’) is the proof of strong converses in multiuser information theory by Ahlswede, G'acs and K"orner in 1976. Since then, measure concentration has found applications in many other information theoretic problems, most notably the converse (impossibility) results in…

Find out more »## October 2018

## Efficient Algorithms for the Graph Matching Problem in Correlated Random Graphs

Tselil Schramm (Harvard University)

Abstract: The Graph Matching problem is a robust version of the Graph Isomorphism problem: given two not-necessarily-isomorphic graphs, the goal is to find a permutation of the vertices which maximizes the number of common edges. We study a popular average-case variant; we deviate from the common heuristic strategy and give the first quasi-polynomial time algorithm, where previously only sub-exponential time algorithms were known. Based on joint work with Boaz Barak, Chi-Ning Chou, Zhixian Lei, and Yueqi Sheng. Biography: Tselil Schramm is a postdoc in theoretical…

Find out more »## Locally private estimation, learning, inference, and optimality

John Duchi (Stanford University)

Abstract: In this talk, we investigate statistical learning and estimation under local privacy constraints, where data providers do not trust the collector of the data and so privatize their data before it is even collected. We identify fundamental tradeoffs between statistical utility and privacy in such local models of privacy, providing instance-specific bounds for private estimation and learning problems by developing local minimax risks. In contrast to approaches based on worst-case (minimax) error, which are conservative, this allows us to…

Find out more »## Algorithmic thresholds for tensor principle component analysis

Aukosh Jagannath (Harvard University)

Abstract: Consider the problem of recovering a rank 1 tensor of order k that has been subject to Gaussian noise. The log-likelihood for this problem is highly non-convex. It is information theoretically possible to recover the tensor with a finite number of samples via maximum likelihood estimation, however, it is expected that one needs a polynomially diverging number of samples to efficiently recover it. What is the cause of this large statistical–to–algorithmic gap? To study this question, we investigate the…

Find out more »## On the cover time of two classes of graph

Alan Frieze (Carnegie Mellon University)

Abstract: Dense Graphs: We consider abritrary graphs G with n vertices and minimum degree at least n. where δ > 0 is constant. If the conductance of G is suﬃciently large then we obtain an asymptotic expression for the cover time CG of G as the solution to some explicit transcendental equation. Failing this, if the mixing time of a random walk on G is of a lesser magnitude than the cover time, then we can obtain an asymptotic deterministic…

Find out more »## November 2018

## Joint estimation of parameters in Ising Model

Sumit Mukherjee (Columbia University)

Abstract: Inference in the framework of Ising models has received significant attention in Statistics and Machine Learning in recent years. In this talk we study joint estimation of the inverse temperature parameter β, and the magnetization parameter B, given one realization from the Ising model, under the assumption that the underlying graph of the Ising model is completely specified. We show that if the graph is either irregular or sparse, then both the parameters can be estimated at rate n−1/2…

Find out more »## Optimal hypothesis testing for stochastic block models with growing degrees

Zongming Ma (University of Pennsylvania)

Abstract: In this talk, we discuss optimal hypothesis testing for distinguishing a stochastic block model from an Erdos--Renyi random graph when the average degree grows to infinity with the graph size. We show that linear spectral statistics based on Chebyshev polynomials of the adjacency matrix can approximate signed cycles of growing lengths when the graph is sufficiently dense. The signed cycles have been shown by Banerjee (2018) to determine the likelihood ratio statistic asymptotically. In this way one achieves sharp…

Find out more »## Model-X knockoffs for controlled variable selection in high dimensional nonlinear regression

Lucas Janson (Harvard University)

Abstract: Many contemporary large-scale applications, from genomics to advertising, involve linking a response of interest to a large set of potential explanatory variables in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively select important variables while controlling the fraction of false discoveries, even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new…

Find out more »## Bias Reduction and Asymptotic Eﬃciency in Estimation of Smooth Functionals of High-Dimensional Covariance

Vladimir Koltchinskii (Georgia Institute of Technology)

Abstract: We discuss a recent approach to bias reduction in a problem of estimation of smooth functionals of high-dimensional parameters of statistical models. In particular, this approach has been developed in the case of estimation of functionals of covariance operator Σ : Rd d → Rd of the form f(Σ), B based on n i.i.d. observations X1, . . . , Xn sampled from the normal distribution with mean zero and covariance Σ, f : R → R being a…

Find out more »## December 2018

## Reducibility and Computational Lower Bounds for Some High-dimensional Statistics Problems

Guy Bresler (MIT)

Abstract: The prototypical high-dimensional statistics problem entails finding a structured signal in noise. Many of these problems exhibit an intriguing phenomenon: the amount of data needed by all known computationally efficient algorithms far exceeds what is needed for inefficient algorithms that search over all possible structures. A line of work initiated by Berthet and Rigollet in 2013 has aimed to explain these gaps by reducing from conjecturally hard problems in computer science. However, the delicate nature of average-case reductions has…

Find out more »## Large girth approximate Steiner triple systems

Lutz Warnke (Georgia Institute of Technology)

Abstract: In 1973 Erdos asked whether there are n-vertex partial Steiner triple systems with arbitrary high girth and quadratically many triples. (Here girth is defined as the smallest integer g \ge 4 for which some g-element vertex-set contains at least g-2 triples.) We answer this question, by showing existence of approximate Steiner triple systems with arbitrary high girth. More concretely, for any fixed \ell \ge 4 we show that a natural constrained random process typically produces a partial Steiner triple…

Find out more »## February 2019

## Medical Image Imputation

Polina Golland (MIT CSAIL)

Abstract: We present an algorithm for creating high resolution anatomically plausible images that are consistent with acquired clinical brain MRI scans with large inter-slice spacing. Although large databases of clinical images contain a wealth of information, medical acquisition constraints result in sparse scans that miss much of the anatomy. These characteristics often render computational analysis impractical as standard processing algorithms tend to fail when applied to such images. Our goal is to enable application of existing algorithms that were originally…

Find out more »## Capacity lower bound for the Ising perceptron

Nike Sun (MIT)

Abstract: The perceptron is a toy model of a simple neural network that stores a collection of given patterns. Its analysis reduces to a simple problem in high-dimensional geometry, namely, understanding the intersection of the cube (or sphere) with a collection of random half-spaces. Despite the simplicity of this model, its high-dimensional asymptotics are not well understood. I will describe what is known and present recent results. This is a joint work with Jian Ding. Biography: Nike Sun is a…

Find out more »## March 2019

## Why Aren’t Network Statistics Accompanied By Uncertainty Statements?

Eric Kolaczyk (Boston University)

Abstract: Over 500K scientific articles have been published since 1999 with the word “network” in the title. And the vast majority of these report network summary statistics of one type or another. However, these numbers are rarely accompanied by any quantification of uncertainty. Yet any error inherent in the measurements underlying the construction of the network, or in the network construction procedure itself, necessarily must propagate to any summary statistics reported. Perhaps surprisingly, there is little in the way of…

Find out more »## Univariate total variation denoising, trend filtering and multivariate Hardy-Krause variation denoising

Aditya Guntuboyina (UC Berkley)

Abstract: Total variation denoising (TVD) is a popular technique for nonparametric function estimation. I will first present a theoretical optimality result for univariate TVD for estimating piecewise constant functions. I will then present related results for various extensions of univariate TVD including adaptive risk bounds for higher-order TVD (also known as trend filtering) as well as a multivariate extension via the Hardy-Krause Variation which avoids the curse of dimensionality to some extent. I will also mention connections to shape restricted…

Find out more »## Subvector Inference in Partially Identified Models with Many Moment Inequalities

Alex Belloni (Duke University)

Abstract: In this work we consider bootstrap-based inference methods for functions of the parameter vector in the presence of many moment inequalities where the number of moment inequalities, denoted by p, is possibly much larger than the sample size n. In particular this covers the case of subvector inference, such as the inference on a single component associated with a treatment/policy variable of interest. We consider a min-max of (centered and non-centered) Studentized statistics and study the properties of the…

Find out more »## Optimization of random polynomials on the sphere in the full-RSB regime

Eliran Subag (New York University)

Abstract: The talk will focus on optimization on the high-dimensional sphere when the objective function is a linear combination of homogeneous polynomials with standard Gaussian coefficients. Such random processes are called spherical spin glasses in physics, and have been extensively studied since the 80s. I will describe certain geometric properties of spherical spin glasses unique to the full-RSB case, and explain how they can be used to design a polynomial time algorithm that finds points within small multiplicative error from…

Find out more »## April 2019

## Exponential line-crossing inequalities

Aaditya Ramdas (Carnegie Mellon University)

Abstract: This talk will present a class of exponential bounds for the probability that a martingale sequence crosses a time-dependent linear threshold. Our key insight is that it is both natural and fruitful to formulate exponential concentration inequalities in this way. We will illustrate this point by presenting a single assumption and a single theorem that together strengthen many tail bounds for martingales, including classical inequalities (1960-80) by Bernstein, Bennett, Hoeffding, and Freedman; contemporary inequalities (1980-2000) by Shorack and Wellner,…

Find out more »## Logistic Regression: The Importance of Being Improper

Dylan Foster (MIT Institute for Foundations of Data Science)

Abstract: Logistic regression is a fundamental task in machine learning and statistics. For the simple case of linear models, Hazan et al. (2014) showed that any logistic regression algorithm that estimates model weights from samples must exhibit exponential dependence on the weight magnitude. As an alternative, we explore a counterintuitive technique called improper learning, whereby one estimates a linear model by fitting a non-linear model. Past success stories for improper learning have focused on cases where it can improve computational…

Find out more »## Robust Estimation: Optimal Rates, Computation and Adaptation

Chao Gao (University of Chicago)

Abstract: Chao Gao will discuss the problem of statistical estimation with contaminated data. In the first part of the talk, I will discuss depth-based approaches that achieve minimax rates in various problems. In general, the minimax rate of a given problem with contamination consists of two terms: the statistical complexity without contamination, and the contamination effect in the form of modulus of continuity. In the second part of the talk, I will discuss computational challenges of these depth-based estimators. An…

Find out more »## May 2019

## Stochastics and Statistics Seminar: Optimal Adaptivity of Signed-Polygon Statistics for Network Testing (Tracy Ke, Harvard University)

Tracy Ke (Harvard University)

Given a symmetric social network, we are interested in testing whether it has only one community or multiple communities. The desired tests should (a) accommodate severe degree heterogeneity, (b) accommodate mixed-memberships, (c) have a tractable null distribution, and (d) adapt automatically to different levels of sparsity, and achieve the optimal detection boundary. How to find such a test is a challenging problem. We propose the Signed Polygon as a class of new tests. Fix m ≥ 3. For each m-gon…

Find out more »## Counting and sampling at low temperatures

Will Perkins (University of Illinois at Chicago)

Abstract: We consider the problem of efficient sampling from the hard-core and Potts models from statistical physics. On certain families of graphs, phase transitions in the underlying physics model are linked to changes in the performance of some sampling algorithms, including Markov chains. We develop new sampling and counting algorithms that exploit the phase transition phenomenon and work efficiently on lattices (and bipartite expander graphs) at sufficiently low temperatures in the phase coexistence regime. Our algorithms are based on Pirogov-Sinai…

Find out more »## September 2019

## GANs, Optimal Transport, and Implicit Density Estimation

Tengyuan Liang (University of Chicago)

Abstract: We first study the rate of convergence for learning distributions with the adversarial framework and Generative Adversarial Networks (GANs), which subsumes Wasserstein, Sobolev, and MMD GANs as special cases. We study a wide range of parametric and nonparametric target distributions, under a collection of objective evaluation metrics. On the nonparametric end, we investigate the minimax optimal rates and fundamental difficulty of the implicit density estimation under the adversarial framework. On the parametric end, we establish a theory for general…

Find out more »## Automated Data Summarization for Scalability in Bayesian Inference

Tamara Broderick (MIT)

IDS.190 - Topics in Bayesian Modeling and Computation Abstract: Many algorithms take prohibitively long to run on modern, large datasets. But even in complex data sets, many data points may be at least partially redundant for some task of interest. So one might instead construct and use a weighted subset of the data (called a "coreset") that is much smaller than the original dataset. Typically running algorithms on a much smaller data set will take much less computing time, but…

Find out more »## Probabilistic Modeling meets Deep Learning using TensorFlow Probability

Brian Patton (Google AI)

IDS.190 - Topics in Bayesian Modeling and Computation Speaker: Brian Patton (Google AI) Abstract: TensorFlow Probability provides a toolkit to enable researchers and practitioners to integrate uncertainty with gradient-based deep learning on modern accelerators. In this talk we'll walk through some practical problems addressed using TFP; discuss the high-level interfaces, goals, and principles of the library; and touch on some recent innovations in describing probabilistic graphical models. Time-permitting, we may touch on a couple areas of research interest for the…

Find out more »## Some New Insights On Transfer Learning

Samory Kpotufe (Columbia)

Abstract: The problem of transfer and domain adaptation is ubiquitous in machine learning and concerns situations where predictive technologies, trained on a given source dataset, have to be transferred to a new target domain that is somewhat related. For example, transferring voice recognition trained on American English accents to apply to Scottish accents, with minimal retraining. A first challenge is to understand how to properly model the ‘distance’ between source and target domains, viewed as probability distributions over a feature…

Find out more »## Frontiers of Efficient Neural-Network Learnability

Adam Klivans (UT Austin)

Abstract: What are the most expressive classes of neural networks that can be learned, provably, in polynomial-time in a distribution-free setting? In this talk we give the first efficient algorithm for learning neural networks with two nonlinear layers using tools for solving isotonic regression, a nonconvex (but tractable) optimization problem. If we further assume the distribution is symmetric, we obtain the first efficient algorithm for recovering the parameters of a one-layer convolutional network. These results implicitly make use of a…

Find out more »## October 2019

## Behavior of the Gibbs Sampler in the Imbalanced Case/Bias Correction from Daily Min and Max Temperature Measurements

Natesh Pillai (Harvard)

IDS.190 Topics in Bayesian Modeling and Computation *Note: The speaker this week will give two shorter talks within the usual session Title: Behavior of the Gibbs sampler in the imbalanced case Abstract: Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presents a fundamental barrier to routine use; a single class of algorithms does not work well in all settings and…

Find out more »## Probabilistic Programming and Artificial Intelligence

Vikash Mansinghka (MIT)

IDS.190 – Topics in Bayesian Modeling and Computation Abstract: Probabilistic programming is an emerging field at the intersection of programming languages, probability theory, and artificial intelligence. This talk will show how to use recently developed probabilistic programming languages to build systems for robust 3D computer vision, without requiring any labeled training data; for automatic modeling of complex real-world time series; and for machine-assisted analysis of experimental data that is too small and/or messy for standard approaches from machine learning and…

Find out more »## The Planted Matching Problem

Cristopher Moore (Santa Fe Institute)

Abstract: What happens when an optimization problem has a good solution built into it, but which is partly obscured by randomness? Here we revisit a classic polynomial-time problem, the minimum perfect matching problem on bipartite graphs. If the edges have random weights in , Mézard and Parisi — and then Aldous, rigorously — showed that the minimum matching has expected weight zeta(2) = pi^2/6. We consider a “planted” version where a particular matching has weights drawn from an exponential distribution…

Find out more »## Markov Chain Monte Carlo Methods and Some Attempts at Parallelizing Them

Pierre E. Jacob (Harvard University)

IDS.190 – Topics in Bayesian Modeling and Computation Abstract: MCMC methods yield approximations that converge to quantities of interest in the limit of the number of iterations. This iterative asymptotic justification is not ideal: it stands at odds with current trends in computing hardware. Namely, it would often be computationally preferable to run many short chains in parallel, but such an approach is flawed because of the so-called "burn-in" bias. This talk will first describe that issue and some known…

Find out more »## Towards Robust Statistical Learning Theory

Stanislav Minsker (USC)

Abstract: Real-world data typically do not fit statistical models or satisfy assumptions underlying the theory exactly, hence reducing the number and strictness of these assumptions helps to lessen the gap between the “mathematical” world and the “real” world. The concept of robustness, in particular, robustness to outliers, plays the central role in understanding this gap. The goal of the talk is to introduce the principles and robust algorithms based on these principles that can be applied in the general framework of statistical…

Find out more »