## February 2017

## Slope meets Lasso in sparse linear regression

Pierre Bellec (Rutgers)

Abstract: We will present results in sparse linear regression on two convex regularized estimators, the Lasso and the recently introduced Slope estimator, in the high-dimensional setting where the number of covariates p is larger than the number of observations n. The estimation and prediction performance of these estimators will be presented, as well as a comparative study of the assumptions on the design matrix. https://arxiv.org/pdf/1605.08651.pdf Biography: I am an Assistant Professor of statistics at Rutgers, the State University of New Jersey. I obtained my PhD…

Find out more »## Causal Discovery in Systems with Feedback Cycles

Frederick Eberhardt (CalTech)

Abstract: While causal relations are generally considered to be anti-symmetric, we often find that over time there are feedback systems such that a variable can have a causal effect on itself. Such "cyclic" causal systems pose significant challenges for causal analysis, both in terms of the appropriate representation of the system under investigation, and for the development of algorithms that attempt to infer as much as possible about the underlying causal system from statistical data. This talk will aim to provide some theoretical insights about…

Find out more »## Estimating the number of connected components of large graphs based on subgraph sampling

Yihong Wu (Yale)

Abstract: Learning properties of large graphs from samples is an important problem in statistical network analysis, dating back to the early work of Goodman and Frank. We revisit the problem formulated by Frank (1978) of estimating the numbers of connected components in a graph of N vertices based on the subgraph sampling model, where we observe the subgraph induced by n vertices drawn uniformly at random. The key question is whether it is possible to achieve accurate estimation, i.e., vanishing normalized mean-square error,…

Find out more »## March 2017

## Computing partition functions by interpolation

Alexander Barvinok (University of Michigan)

Abstract: Partition functions are just multivariate polynomials with great many monomials enumerating combinatorial structures of a particular type and their efficient computation (approximation) are of interest for combinatorics, statistics, physics and computational complexity. I’ll present a general principle: the partition function can be efficiently approximated in a domain if it has no complex zeros in a slightly larger domain, and illustrate it on the examples of the permanent of a matrix, the independence polynomial of a graph and, time permitting, the graph homomorphism partition…

Find out more »## Jagers-Nerman stable age distribution theory, change point detection and power of two choices in evolving networks

Shankar Bhamidi (UNC)

Abstract: (i) Change point detection for networks: We consider the preferential attachment model. We formulate and study the regime where the network transitions from one evolutionary scheme to another. In the large network limit we derive asymptotics for various functionals of the network including degree distribution and maximal degree. We study functional central limit theorems for the evolution of the degree distribution which feed into proving consistency of a proposed estimator of the change point. (ii) Power of choice and network…

Find out more »## May 2017

## Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe

Vianney Perchet (ENS Paris-Saclay)

Abstract: We consider the problem of bandit optimization, inspired by stochastic optimization and online learning with bandit feedback. In this problem, the objective is to minimize a global, not necessarily cumulative, convex loss function. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques ranging from bandits to convex optimization. We identify slow and fast of…

Find out more »## September 2017

## New provable techniques for learning and inference in probabilistic graphical models

Andrej Risteski (Princeton University)

Abstract: A common theme in machine learning is succinct modeling of distributions over large domains. Probabilistic graphical models are one of the most expressive frameworks for doing this. The two major tasks involving graphical models are learning and inference. Learning is the task of calculating the "best fit" model parameters from raw data, while inference is the task of answering probabilistic queries for a model with known parameters (e.g. what is the marginal distribution of a subset of variables, after…

Find out more »## Sample complexity of population recovery

Yury Polyanskiy (MIT)

Abstract: In this talk we will first consider a general question of estimating linear functional of the distribution based on the noisy samples from it. We discover that the (two-point) LeCam lower bound is in fact achievable by optimizing bias-variance tradeoff of an empirical-mean type of estimator. Next, we apply this general framework to the specific problem of population recovery. Namely, consider a random poll of sample size n conducted on a population of individuals, where each pollee is asked to…

Find out more »## Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams

Jelani Nelson (Harvard University)

Abstract: Consider the following problem: we monitor a sequence of edgeinsertions and deletions in a graph on n vertices, so there are N = (n choose 2) possible edges (e.g. monitoring a stream of friend accepts/removals on Facebook). At any point someone may say "query()", at which point must output a random edge that exists in the graph at that time from a distribution that is statistically close to uniform. More specifically, with probability p our edge should come from a distribution close to uniform,…

Find out more »## October 2017

## Transport maps for Bayesian computation

Youssef Marzouk (MIT)

Abstract: Integration against an intractable probability measure is among the fundamental challenges of Bayesian inference. A useful approach to this problem seeks a deterministic coupling of the measure of interest with a tractable "reference" measure (e.g., a standard Gaussian). This coupling is induced by a transport map, and enables direct simulation from the desired measure simply by evaluating the transport map at samples from the reference. Approximate transports can also be used to "precondition" standard Monte Carlo schemes. Yet characterizing a…

Find out more »## Additivity of Information in Deep Generative Networks: The I-MMSE Transform Method

Galen Reeves (Duke University)

Abstract: Deep generative networks are powerful probabilistic models that consist of multiple stages of linear transformations (described by matrices) and non-linear, possibly random, functions (described generally by information channels). These models have gained great popularity due to their ability to characterize complex probabilistic relationships arising in a wide variety of inference problems. In this talk, we introduce a new method for analyzing the fundamental limits of statistical inference in settings where the model is known. The validity of our method can…

Find out more »## Structure in multi-index tensor data: a trivial byproduct of simpler phenomena?

John Cunningham (Columbia)

Abstract: As large tensor-variate data become increasingly common across applied machine learning and statistics, complex analysis methods for these data similarly increase in prevalence. Such a trend offers the opportunity to understand subtler and more meaningful features of the data that, ostensibly, could not be studied with simpler datasets or simpler methodologies. While promising, these advances are also perilous: novel analysis techniques do not always consider the possibility that their results are in fact an expected consequence of some simpler, already-known…

Find out more »## Inference in dynamical systems and the geometry of learning group actions

Sayan Mukherjee (Duke)

Abstract: We examine consistency of the Gibbs posterior for dynamical systems using a classical idea in dynamical systems called the thermodynamic formalism in tracking dynamical systems. We state a variation formulation under which there is a unique posterior distribution of parameters as well as hidden states using using classic ideas from dynamical systems such as pressure and joinings. We use an example of consistency of hidden Markov with infinite lags as an application of our theory. We develop a geometric framework that characterizes…

Find out more »## On Learning Theory and Neural Networks

Amit Daniely (Google)

Abstract: Can learning theory, as we know it today, form a theoretical basis for neural networks. I will try to discuss this question in light of two new results -- one positive and one negative. Based on joint work with Roy Frostig, Vineet Gupta and Yoram Singer, and with Vitaly Feldman Biography: Amit Daniely is an Assistant Professor at the Hebrew University in Jerusalem, and a research scientist at Google Research, Tel-Aviv. Prior to that, he was a research scientist at Google Research, Mountain-View. Even…

Find out more »## November 2017

## Unbiased Markov chain Monte Carlo with couplings

Pierre Jacob (Harvard)

Abstract: Markov chain Monte Carlo methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, these estimators are generally biased after any fixed number of iterations, which complicates both parallel computation. In this talk I will explain how to remove this burn-in bias by using couplings of Markov chains and a telescopic sum argument, inspired by Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and averaged. I will present…

Find out more »## Statistics, Computation and Learning with Graph Neural Networks

Joan Bruna Estrach (NYU)

Abstract: Deep Learning, thanks mostly to Convolutional architectures, has recently transformed computer vision and speech recognition. Their ability to encode geometric stability priors, while offering enough expressive power, is at the core of their success. In such settings, geometric stability is expressed in terms of local deformations, and it is enforced thanks to localized convolutional operators that separate the estimation into scales. Many problems across applied sciences, from particle physics to recommender systems, are formulated in terms of signals defined over…

Find out more »## Generative Models and Compressed Sensing

Alex Dimakis (University of Texas at Austin)

Abstract: The goal of compressed sensing is to estimate a vector from an under-determined system of noisy linear measurements, by making use of prior knowledge in the relevant domain. For most results in the literature, the structure is represented by sparsity in a well-chosen basis. We show how to achieve guarantees similar to standard compressed sensing but without employing sparsity at all. Instead, we assume that the unknown vectors lie near the range of a generative model, e.g. a GAN…

Find out more »## December 2017

## Challenges in Developing Learning Algorithms to Personalize Treatment in Real Time

Susan Murphy (Harvard)

Abstract: A formidable challenge in designing sequential treatments is to determine when and in which context it is best to deliver treatments. Consider treatment for individuals struggling with chronic health conditions. Operationally designing the sequential treatments involves the construction of decision rules that input current context of an individual and output a recommended treatment. That is, the treatment is adapted to the individual's context; the context may include current health status, current level of social support and current level of adherence…

Find out more »## Genome-wide association, phenotype prediction, and population structure: a review and some open problems

Alex Bloemendal (Broad Institute)

Abstract: I will give a broad overview of human genetic variation, polygenic traits, association studies, heritability estimation and risk prediction. I will focus on the dual correlation structures of linkage disequilibrium and population structure, discussing how these both confound and enable the various analyses we perform. I will highlight an important open problem on the failure of polygenic risk prediction to generalize across diverse ancestries. Biography: Alex Bloemendal is a computational scientist at the Broad Institute of MIT and Harvard…

Find out more »## February 2018

## Connections between structured estimation and weak submodularity

Sahand Negahban (Yale University)

Abstract: Many modern statistical estimation problems rely on imposing additional structure in order to reduce the statistical complexity and provide interpretability. Unfortunately, these structures often are combinatorial in nature and result in computationally challenging problems. In parallel, the combinatorial optimization community has placed significant effort in developing algorithms that can approximately solve such optimization problems in a computationally efficient manner. The focus of this talk is to expand upon ideas that arise in combinatorial optimization and connect those algorithms and…

Find out more »## Variable selection using presence-only data with applications to biochemistry

Garvesh Raskutti (University of Wisconsin)

Abstract: In a number of problems, we are presented with positive and unlabelled data, referred to as presence-only responses. The application I present today involves studying the relationship between protein sequence and function and presence-only data arises since for many experiments it is impossible to obtain a large set of negative (non-functional) sequences. Furthermore, if the number of variables is large and the goal is variable selection (as in this case), a number of statistical and computational challenges arise due…

Find out more »## User-friendly guarantees for the Langevin Monte Carlo

Arnak Dalalyan (ENSAE-CREST)

Abstract: In this talk, I will revisit the recently established theoretical guarantees for the convergence of the Langevin Monte Carlo algorithm of sampling from a smooth and (strongly) log-concave density. I will discuss the existing results when the accuracy of sampling is measured in the Wasserstein distance and provide further insights on relations between, on the one hand, the Langevin Monte Carlo for sampling and, on the other hand, the gradient descent for optimization. I will also present non-asymptotic guarantees for the accuracy…

Find out more »## Optimization’s Implicit Gift to Learning: Understanding Optimization Bias as a Key to Generalization

Nathan Srebro-Bartom (TTI-Chicago)

Abstract: It is becoming increasingly clear that implicit regularization afforded by the optimization algorithms play a central role in machine learning, and especially so when using large, deep, neural networks. We have a good understanding of the implicit regularization afforded by stochastic approximation algorithms, such as SGD, and as I will review, we understand and can characterize the implicit bias of different algorithms, and can design algorithms with specific biases. But in this talk I will focus on implicit biases of…

Find out more »## March 2018

## One and two sided composite-composite tests in Gaussian mixture models

Alexandra Carpentier (Otto von Guericke Universitaet)

Abstract: Finding an efficient test for a testing problem is often linked to the problem of estimating a given function of the data. When this function is not smooth, it is necessary to approximate it cleverly in order to build good tests. In this talk, we will discuss two specific testing problems in Gaussian mixtures models. In both, the aim is to test the proportion of null means. The aforementioned link between sharp approximation rates of non-smooth objects and minimax testing…

Find out more »## Statistical estimation under group actions: The Sample Complexity of Multi-Reference Alignment

Afonso Bandeira (NYU)

Abstract: : Many problems in signal/image processing, and computer vision amount to estimating a signal, image, or tri-dimensional structure/scene from corrupted measurements. A particularly challenging form of measurement corruption are latent transformations of the underlying signal to be recovered. Many such transformations can be described as a group acting on the object to be recovered. Examples include the Simulatenous Localization and Mapping (SLaM) problem in Robotics and Computer Vision, where pictures of a scene are obtained from different positions andorientations;…

Find out more »## When Inference is tractable

David Sontag (MIT)

Abstract: A key capability of artificial intelligence will be the ability to reason about abstract concepts and draw inferences. Where data is limited, probabilistic inference in graphical models provides a powerful framework for performing such reasoning, and can even be used as modules within deep architectures. But, when is probabilistic inference computationally tractable? I will present recent theoretical results that substantially broaden the class of provably tractable models by exploiting model stability (Lang, Sontag, Vijayaraghavan, AI Stats ’18), structure in…

Find out more »## Statistical theory for deep neural networks with ReLU activation function

Johannes Schmidt-Hieber (Leiden)

Abstract: The universal approximation theorem states that neural networks are capable of approximating any continuous function up to a small error that depends on the size of the network. The expressive power of a network does, however, not guarantee that deep networks perform well on data. For that, control of the statistical estimation risk is needed. In the talk, we derive statistical theory for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown…

Find out more »## April 2018

## Optimality of Spectral Methods for Ranking, Community Detections and Beyond

Jianqing Fan (Princeton University)

Abstract: Spectral methods have been widely used for a large class of challenging problems, ranging from top-K ranking via pairwise comparisons, community detection, factor analysis, among others. Analyses of these spectral methods require super-norm perturbation analysis of top eigenvectors. This allows us to UNIFORMLY approximate elements in eigenvectors by linear functions of the observed random matrix that can be analyzed further. We first establish such an infinity-norm pertubation bound for top eigenvectors and apply the idea to several challenging problems…

Find out more »## Testing degree corrections in Stochastic Block Models

Subhabrata Sen (Microsoft)

Abstract: The community detection problem has attracted significant attention in re- cent years, and it has been studied extensively under the framework of a Stochas- tic Block Model (SBM). However, it is well-known that SBMs t real data very poorly, and various extensions have been suggested to replicate characteristics of real data. The recovered community assignments are often sensitive to the model used, and this naturally begs the following question: Given a network with community structure, how to decide whether…

Find out more »## Inference, Computation, and Visualization for Convex Clustering and Biclustering

Genevera Allen (Rice)

Abstract: Hierarchical clustering enjoys wide popularity because of its fast computation, ease of interpretation, and appealing visualizations via the dendogram and cluster heatmap. Recently, several have proposed and studied convex clustering and biclustering which, similar in spirit to hierarchical clustering, achieve cluster merges via convex fusion penalties. While these techniques enjoy superior statistical performance, they suffer from slower computation and are not generally conducive to representation as a dendogram. In the first part of the talk, we present new convex…

Find out more »## May 2018

## Size-Independent Sample Complexity of Neural Networks

Ohad Shamir (Weizman Institute)

Abstract: I'll describe new bounds on the sample complexity of deep neural networks, based on the norms of the parameter matrices at each layer. In particular, we show how certain norms lead to the first explicit bounds which are fully independent of the network size (both depth and width), and are therefore applicable to arbitrarily large neural networks. These results are derived using some novel techniques, which may be of independent interest. Joint work with Noah Golowich (Harvard) and Alexander…

Find out more »## Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions

Adel Javanmard (USC)

Abstract: Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers’ valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers’ valuations, i.e., buyers’ preferences. The seller’s goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is…

Find out more »## Fitting a putative manifold to noisy data

Hariharan Narayanan (Tata Institute of Fundamental Research, Mumbai)

Abstract: We give a solution to the following question from manifold learning. Suppose data belonging to a high dimensional Euclidean space is drawn independently, identically distributed from a measure supported on a low dimensional twice differentiable embedded compact manifold M, and is corrupted by a small amount of i.i.d gaussian noise. How can we produce a manifold M whose Hausdorff distance to M is small and whose reach (normal injectivity radius) is not much smaller than the reach of M? This…

Find out more »## August 2018

## Resource-efficient ML in 2 KB RAM for the Internet of Things

Prateek Jain (Microsoft Research)

Abstract: We propose an alternative paradigm for the Internet of Things (IoT) where machine learning algorithms run locally on severely resource-constrained edge and endpoint devices without necessarily needing cloud connectivity. This enables many scenarios beyond the pale of the traditional paradigm including low-latency brain implants, precision agriculture on disconnected farms, privacy-preserving smart spectacles, etc. Towards this end, we develop novel tree and kNN based algorithm, called Bonsai and ProtoNN, for efficient prediction on IoT devices -- such as those based…

Find out more »## September 2018

## Variational problems on random structures and their continuum limits

Dejan Slepčev (Carnegie Mellon University)

Abstract: We will discuss variational problems arising in machine learning and their limits as the number of data points goes to infinity. Consider point clouds obtained as random samples of an underlying "ground-truth" measure. Graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points. Many machine learning tasks, such as clustering and semi-supervised learning, can be posed as minimizing functionals on such graphs. We consider functionals involving graph cuts, graph laplacians…

Find out more »## Reverse hypercontractivity beats measure concentration for information theoretic converses

Jingbo Liu (MIT)

Abstract: Concentration of measure refers to a collection of tools and results from analysis and probability theory that have been used in many areas of pure and applied mathematics. Arguably, the first data science application of measure concentration (under the name ‘‘blowing-up lemma’’) is the proof of strong converses in multiuser information theory by Ahlswede, G'acs and K"orner in 1976. Since then, measure concentration has found applications in many other information theoretic problems, most notably the converse (impossibility) results in…

Find out more »## October 2018

## Efficient Algorithms for the Graph Matching Problem in Correlated Random Graphs

Tselil Schramm (Harvard University)

Abstract: The Graph Matching problem is a robust version of the Graph Isomorphism problem: given two not-necessarily-isomorphic graphs, the goal is to find a permutation of the vertices which maximizes the number of common edges. We study a popular average-case variant; we deviate from the common heuristic strategy and give the first quasi-polynomial time algorithm, where previously only sub-exponential time algorithms were known. Based on joint work with Boaz Barak, Chi-Ning Chou, Zhixian Lei, and Yueqi Sheng. Biography: Tselil Schramm is a postdoc in theoretical…

Find out more »## Locally private estimation, learning, inference, and optimality

John Duchi (Stanford University)

Abstract: In this talk, we investigate statistical learning and estimation under local privacy constraints, where data providers do not trust the collector of the data and so privatize their data before it is even collected. We identify fundamental tradeoffs between statistical utility and privacy in such local models of privacy, providing instance-specific bounds for private estimation and learning problems by developing local minimax risks. In contrast to approaches based on worst-case (minimax) error, which are conservative, this allows us to…

Find out more »## Algorithmic thresholds for tensor principle component analysis

Aukosh Jagannath (Harvard University)

Abstract: Consider the problem of recovering a rank 1 tensor of order k that has been subject to Gaussian noise. The log-likelihood for this problem is highly non-convex. It is information theoretically possible to recover the tensor with a finite number of samples via maximum likelihood estimation, however, it is expected that one needs a polynomially diverging number of samples to efficiently recover it. What is the cause of this large statistical–to–algorithmic gap? To study this question, we investigate the…

Find out more »## On the cover time of two classes of graph

Alan Frieze (Carnegie Mellon University)

Abstract: Dense Graphs: We consider abritrary graphs G with n vertices and minimum degree at least n. where δ > 0 is constant. If the conductance of G is suﬃciently large then we obtain an asymptotic expression for the cover time CG of G as the solution to some explicit transcendental equation. Failing this, if the mixing time of a random walk on G is of a lesser magnitude than the cover time, then we can obtain an asymptotic deterministic…

Find out more »## November 2018

## Joint estimation of parameters in Ising Model

Sumit Mukherjee (Columbia University)

Abstract: Inference in the framework of Ising models has received significant attention in Statistics and Machine Learning in recent years. In this talk we study joint estimation of the inverse temperature parameter β, and the magnetization parameter B, given one realization from the Ising model, under the assumption that the underlying graph of the Ising model is completely specified. We show that if the graph is either irregular or sparse, then both the parameters can be estimated at rate n−1/2…

Find out more »## Optimal hypothesis testing for stochastic block models with growing degrees

Zongming Ma (University of Pennsylvania)

Abstract: In this talk, we discuss optimal hypothesis testing for distinguishing a stochastic block model from an Erdos--Renyi random graph when the average degree grows to infinity with the graph size. We show that linear spectral statistics based on Chebyshev polynomials of the adjacency matrix can approximate signed cycles of growing lengths when the graph is sufficiently dense. The signed cycles have been shown by Banerjee (2018) to determine the likelihood ratio statistic asymptotically. In this way one achieves sharp…

Find out more »## Model-X knockoffs for controlled variable selection in high dimensional nonlinear regression

Lucas Janson (Harvard University)

Abstract: Many contemporary large-scale applications, from genomics to advertising, involve linking a response of interest to a large set of potential explanatory variables in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively select important variables while controlling the fraction of false discoveries, even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new…

Find out more »## Bias Reduction and Asymptotic Eﬃciency in Estimation of Smooth Functionals of High-Dimensional Covariance

Vladimir Koltchinskii (Georgia Institute of Technology)

Abstract: We discuss a recent approach to bias reduction in a problem of estimation of smooth functionals of high-dimensional parameters of statistical models. In particular, this approach has been developed in the case of estimation of functionals of covariance operator Σ : Rd d → Rd of the form f(Σ), B based on n i.i.d. observations X1, . . . , Xn sampled from the normal distribution with mean zero and covariance Σ, f : R → R being a…

Find out more »## December 2018

## Reducibility and Computational Lower Bounds for Some High-dimensional Statistics Problems

Guy Bresler (MIT)

Abstract: The prototypical high-dimensional statistics problem entails finding a structured signal in noise. Many of these problems exhibit an intriguing phenomenon: the amount of data needed by all known computationally efficient algorithms far exceeds what is needed for inefficient algorithms that search over all possible structures. A line of work initiated by Berthet and Rigollet in 2013 has aimed to explain these gaps by reducing from conjecturally hard problems in computer science. However, the delicate nature of average-case reductions has…

Find out more »## Large girth approximate Steiner triple systems

Lutz Warnke (Georgia Institute of Technology)

Abstract: In 1973 Erdos asked whether there are n-vertex partial Steiner triple systems with arbitrary high girth and quadratically many triples. (Here girth is defined as the smallest integer g \ge 4 for which some g-element vertex-set contains at least g-2 triples.) We answer this question, by showing existence of approximate Steiner triple systems with arbitrary high girth. More concretely, for any fixed \ell \ge 4 we show that a natural constrained random process typically produces a partial Steiner triple…

Find out more »## February 2019

## Medical Image Imputation

Polina Golland (MIT CSAIL)

Abstract: We present an algorithm for creating high resolution anatomically plausible images that are consistent with acquired clinical brain MRI scans with large inter-slice spacing. Although large databases of clinical images contain a wealth of information, medical acquisition constraints result in sparse scans that miss much of the anatomy. These characteristics often render computational analysis impractical as standard processing algorithms tend to fail when applied to such images. Our goal is to enable application of existing algorithms that were originally…

Find out more »## Capacity lower bound for the Ising perceptron

Nike Sun (MIT)

Abstract: The perceptron is a toy model of a simple neural network that stores a collection of given patterns. Its analysis reduces to a simple problem in high-dimensional geometry, namely, understanding the intersection of the cube (or sphere) with a collection of random half-spaces. Despite the simplicity of this model, its high-dimensional asymptotics are not well understood. I will describe what is known and present recent results. This is a joint work with Jian Ding. Biography: Nike Sun is a…

Find out more »## March 2019

## Why Aren’t Network Statistics Accompanied By Uncertainty Statements?

Eric Kolaczyk (Boston University)

Abstract: Over 500K scientific articles have been published since 1999 with the word “network” in the title. And the vast majority of these report network summary statistics of one type or another. However, these numbers are rarely accompanied by any quantification of uncertainty. Yet any error inherent in the measurements underlying the construction of the network, or in the network construction procedure itself, necessarily must propagate to any summary statistics reported. Perhaps surprisingly, there is little in the way of…

Find out more »## Univariate total variation denoising, trend filtering and multivariate Hardy-Krause variation denoising

Aditya Guntuboyina (UC Berkley)

Abstract: Total variation denoising (TVD) is a popular technique for nonparametric function estimation. I will first present a theoretical optimality result for univariate TVD for estimating piecewise constant functions. I will then present related results for various extensions of univariate TVD including adaptive risk bounds for higher-order TVD (also known as trend filtering) as well as a multivariate extension via the Hardy-Krause Variation which avoids the curse of dimensionality to some extent. I will also mention connections to shape restricted…

Find out more »## Subvector Inference in Partially Identified Models with Many Moment Inequalities

Alex Belloni (Duke University)

Abstract: In this work we consider bootstrap-based inference methods for functions of the parameter vector in the presence of many moment inequalities where the number of moment inequalities, denoted by p, is possibly much larger than the sample size n. In particular this covers the case of subvector inference, such as the inference on a single component associated with a treatment/policy variable of interest. We consider a min-max of (centered and non-centered) Studentized statistics and study the properties of the…

Find out more »## Optimization of random polynomials on the sphere in the full-RSB regime

Eliran Subag (New York University)

Abstract: The talk will focus on optimization on the high-dimensional sphere when the objective function is a linear combination of homogeneous polynomials with standard Gaussian coefficients. Such random processes are called spherical spin glasses in physics, and have been extensively studied since the 80s. I will describe certain geometric properties of spherical spin glasses unique to the full-RSB case, and explain how they can be used to design a polynomial time algorithm that finds points within small multiplicative error from…

Find out more »## April 2019

## Exponential line-crossing inequalities

Aaditya Ramdas (Carnegie Mellon University)

Abstract: This talk will present a class of exponential bounds for the probability that a martingale sequence crosses a time-dependent linear threshold. Our key insight is that it is both natural and fruitful to formulate exponential concentration inequalities in this way. We will illustrate this point by presenting a single assumption and a single theorem that together strengthen many tail bounds for martingales, including classical inequalities (1960-80) by Bernstein, Bennett, Hoeffding, and Freedman; contemporary inequalities (1980-2000) by Shorack and Wellner,…

Find out more »## Logistic Regression: The Importance of Being Improper

Dylan Foster (MIT Institute for Foundations of Data Science)

Abstract: Logistic regression is a fundamental task in machine learning and statistics. For the simple case of linear models, Hazan et al. (2014) showed that any logistic regression algorithm that estimates model weights from samples must exhibit exponential dependence on the weight magnitude. As an alternative, we explore a counterintuitive technique called improper learning, whereby one estimates a linear model by fitting a non-linear model. Past success stories for improper learning have focused on cases where it can improve computational…

Find out more »## Robust Estimation: Optimal Rates, Computation and Adaptation

Chao Gao (University of Chicago)

Abstract: Chao Gao will discuss the problem of statistical estimation with contaminated data. In the first part of the talk, I will discuss depth-based approaches that achieve minimax rates in various problems. In general, the minimax rate of a given problem with contamination consists of two terms: the statistical complexity without contamination, and the contamination effect in the form of modulus of continuity. In the second part of the talk, I will discuss computational challenges of these depth-based estimators. An…

Find out more »## May 2019

## Optimal Adaptivity of Signed-Polygon Statistics for Network Testing (Tracy Ke, Harvard University)

Tracy Ke (Harvard University)

Abstract: Given a symmetric social network, we are interested in testing whether it has only one community or multiple communities. The desired tests should (a) accommodate severe degree heterogeneity, (b) accommodate mixed-memberships, (c) have a tractable null distribution, and (d) adapt automatically to different levels of sparsity, and achieve the optimal detection boundary. How to find such a test is a challenging problem. We propose the Signed Polygon as a class of new tests. Fix m ≥ 3. For each…

Find out more »## Counting and sampling at low temperatures

Will Perkins (University of Illinois at Chicago)

Abstract: We consider the problem of efficient sampling from the hard-core and Potts models from statistical physics. On certain families of graphs, phase transitions in the underlying physics model are linked to changes in the performance of some sampling algorithms, including Markov chains. We develop new sampling and counting algorithms that exploit the phase transition phenomenon and work efficiently on lattices (and bipartite expander graphs) at sufficiently low temperatures in the phase coexistence regime. Our algorithms are based on Pirogov-Sinai…

Find out more »## September 2019

## GANs, Optimal Transport, and Implicit Density Estimation

Tengyuan Liang (University of Chicago)

Abstract: We first study the rate of convergence for learning distributions with the adversarial framework and Generative Adversarial Networks (GANs), which subsumes Wasserstein, Sobolev, and MMD GANs as special cases. We study a wide range of parametric and nonparametric target distributions, under a collection of objective evaluation metrics. On the nonparametric end, we investigate the minimax optimal rates and fundamental difficulty of the implicit density estimation under the adversarial framework. On the parametric end, we establish a theory for general…

Find out more »## Automated Data Summarization for Scalability in Bayesian Inference

Tamara Broderick (MIT)

IDS.190 - Topics in Bayesian Modeling and Computation Abstract: Many algorithms take prohibitively long to run on modern, large datasets. But even in complex data sets, many data points may be at least partially redundant for some task of interest. So one might instead construct and use a weighted subset of the data (called a "coreset") that is much smaller than the original dataset. Typically running algorithms on a much smaller data set will take much less computing time, but…

Find out more »## Probabilistic Modeling meets Deep Learning using TensorFlow Probability

Brian Patton (Google AI)

IDS.190 - Topics in Bayesian Modeling and Computation Speaker: Brian Patton (Google AI) Abstract: TensorFlow Probability provides a toolkit to enable researchers and practitioners to integrate uncertainty with gradient-based deep learning on modern accelerators. In this talk we'll walk through some practical problems addressed using TFP; discuss the high-level interfaces, goals, and principles of the library; and touch on some recent innovations in describing probabilistic graphical models. Time-permitting, we may touch on a couple areas of research interest for the…

Find out more »## Some New Insights On Transfer Learning

Samory Kpotufe (Columbia)

Abstract: The problem of transfer and domain adaptation is ubiquitous in machine learning and concerns situations where predictive technologies, trained on a given source dataset, have to be transferred to a new target domain that is somewhat related. For example, transferring voice recognition trained on American English accents to apply to Scottish accents, with minimal retraining. A first challenge is to understand how to properly model the ‘distance’ between source and target domains, viewed as probability distributions over a feature…

Find out more »## Frontiers of Efficient Neural-Network Learnability

Adam Klivans (UT Austin)

Abstract: What are the most expressive classes of neural networks that can be learned, provably, in polynomial-time in a distribution-free setting? In this talk we give the first efficient algorithm for learning neural networks with two nonlinear layers using tools for solving isotonic regression, a nonconvex (but tractable) optimization problem. If we further assume the distribution is symmetric, we obtain the first efficient algorithm for recovering the parameters of a one-layer convolutional network. These results implicitly make use of a…

Find out more »## October 2019

## Behavior of the Gibbs Sampler in the Imbalanced Case/Bias Correction from Daily Min and Max Temperature Measurements

Natesh Pillai (Harvard)

IDS.190 Topics in Bayesian Modeling and Computation *Note: The speaker this week will give two shorter talks within the usual session Title: Behavior of the Gibbs sampler in the imbalanced case Abstract: Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presents a fundamental barrier to routine use; a single class of algorithms does not work well in all settings and…

Find out more »## Probabilistic Programming and Artificial Intelligence

Vikash Mansinghka (MIT)

IDS.190 – Topics in Bayesian Modeling and Computation Abstract: Probabilistic programming is an emerging field at the intersection of programming languages, probability theory, and artificial intelligence. This talk will show how to use recently developed probabilistic programming languages to build systems for robust 3D computer vision, without requiring any labeled training data; for automatic modeling of complex real-world time series; and for machine-assisted analysis of experimental data that is too small and/or messy for standard approaches from machine learning and…

Find out more »## The Planted Matching Problem

Cristopher Moore (Santa Fe Institute)

Abstract: What happens when an optimization problem has a good solution built into it, but which is partly obscured by randomness? Here we revisit a classic polynomial-time problem, the minimum perfect matching problem on bipartite graphs. If the edges have random weights in , Mézard and Parisi — and then Aldous, rigorously — showed that the minimum matching has expected weight zeta(2) = pi^2/6. We consider a “planted” version where a particular matching has weights drawn from an exponential distribution…

Find out more »## Markov Chain Monte Carlo Methods and Some Attempts at Parallelizing Them

Pierre E. Jacob (Harvard University)

IDS.190 – Topics in Bayesian Modeling and Computation Abstract: MCMC methods yield approximations that converge to quantities of interest in the limit of the number of iterations. This iterative asymptotic justification is not ideal: it stands at odds with current trends in computing hardware. Namely, it would often be computationally preferable to run many short chains in parallel, but such an approach is flawed because of the so-called "burn-in" bias. This talk will first describe that issue and some known…

Find out more »## Towards Robust Statistical Learning Theory

Stanislav Minsker (USC)

Abstract: Real-world data typically do not fit statistical models or satisfy assumptions underlying the theory exactly, hence reducing the number and strictness of these assumptions helps to lessen the gap between the “mathematical” world and the “real” world. The concept of robustness, in particular, robustness to outliers, plays the central role in understanding this gap. The goal of the talk is to introduce the principles and robust algorithms based on these principles that can be applied in the general framework…

Find out more »## Esther Williams in the Harold Holt Memorial Swimming Pool: Some Thoughts on Complexity

Daniel Simpson (University of Toronto)

IDS.190 – Topics in Bayesian Modeling and Computation Speaker: Daniel Simpson (University of Toronto) Abstract: As data becomes more complex and computational modelling becomes more powerful, we rapidly find ourselves beyond the scope of traditional statistical theory. As we venture beyond the traditional thunderdome, we need to think about how to cope with this additional complexity in our model building. In this talk, I will talk about a few techniques that are useful when specifying prior distributions and building Bayesian models…

Find out more »## Accurate Simulation-Based Parametric Inference in High Dimensional Settings

Maria-Pia Victoria-Feser, (University of Geneva)

Abstract: Accurate estimation and inference in finite sample is important for decision making in many experimental and social fields, especially when the available data are complex, like when they include mixed types of measurements, they are dependent in several ways, there are missing data, outliers, etc. Indeed, the more complex the data (hence the models), the less accurate are asymptotic theory results in finite samples. This is in particular the case, for example, with logistic regression, with possibly also random effects…

Find out more »## November 2019

## One-shot Information Theory via Poisson Processes

Cheuk Ting Li (UC Berkeley)

Abstract: In information theory, coding theorems are usually proved in the asymptotic regime where the blocklength tends to infinity. While there are techniques for finite blocklength analysis, they are often more complex than their asymptotic counterparts. In this talk, we study the use of Poisson processes in proving coding theorems, which not only gives sharp one-shot and finite blocklength results, but also gives significantly shorter proofs than conventional asymptotic techniques in some settings. Instead of using fixed-size random codebooks, we…

Find out more »## SDP Relaxation for Learning Discrete Structures: Optimal Rates, Hidden Integrality, and Semirandom Robustness

Yudong Chen (Cornell)

Abstract: We consider the problems of learning discrete structures from network data under statistical settings. Popular examples include various block models, Z2 synchronization and mixture models. Semidefinite programming (SDP) relaxation has emerged as a versatile and robust approach to these problems. We show that despite being a relaxation, SDP achieves the optimal Bayes error rate in terms of distance to the target solution. Moreover, SDP relaxation is provably robust under the so-called semirandom model, which frustrates many existing algorithms. Our…

Find out more »## Artificial Bayesian Monte Carlo Integration: A Practical Resolution to the Bayesian (Normalizing Constant) Paradox

Xiao-Li Meng (Harvard University)

Abstract: Advances in Markov chain Monte Carlo in the past 30 years have made Bayesian analysis a routine practice. However, there is virtually no practice of performing Monte Carlo integration from the Bayesian perspective; indeed,this problem has earned the “paradox” label in the context of computing normalizing constants (Wasserman, 2013). We first use the modeling-what-we-ignore idea of Kong et al. (2003) to explain that the crux of the paradox is not with the likelihood theory, which is essentially the same…

Find out more »## Understanding Machine Learning with Statistical Physics

Lenka Zdeborová (Institute of Theoretical Physics, CNRS)

Abstract: The affinity between statistical physics and machine learning has long history, this is reflected even in the machine learning terminology that is in part adopted from physics. Current theoretical challenges and open questions about deep learning and statistical learning call for unified account of the following three ingredients: (a) the dynamics of the learning algorithm, (b) the architecture of the neural networks, and (c) the structure of the data. Most existing theories are not taking in account all of…

Find out more »## Stability of a Fluid Model for Fair Bandwidth Sharing with General File Size Distributions

Ruth J Williams (University of California, San Diego)

Abstract: Massoulie and Roberts introduced a stochastic model for a data communication network where file sizes are generally distributed and the network operates under a fair bandwidth sharing policy. It has been a standing problem to prove stability of this general model when the average load on the system is less than the network's capacity. A crucial step in an approach to this problem is to prove stability of an associated measure-valued fluid model. We shall describe prior work on this question done under various strong assumptions and…

Find out more »## A Causal Exposure Response Function with Local Adjustment for Confounding: A study of the health effects of long-term exposure to low levels of fine particulate matter

Francesca Dominici (Harvard University)

Abstract: In the last two decades, ambient levels of air pollution have declined substantially. Yet, as mandated by the Clean Air Act, we must continue to address the following question: is exposure to levels of air pollution that are well below the National Ambient Air Quality Standards (NAAQS) harmful to human health? Furthermore, the highly contentious nature surrounding environmental regulations necessitates casting this question within a causal inference framework. Several parametric and semi-parametric regression modeling approaches have been used to…

Find out more »## Automated Data Summarization for Scalability in Bayesian Inference

Tamara Broderick (MIT)

Abstract: Many algorithms take prohibitively long to run on modern, large data sets. But even in complex data sets, many data points may be at least partially redundant for some task of interest. So one might instead construct and use a weighted subset of the data (called a “coreset”) that is much smaller than the original dataset. Typically running algorithms on a much smaller data set will take much less computing time, but it remains to understand whether the output…

Find out more »## December 2019

## Flexible Perturbation Models for Robustness to Misspecification

Jeffrey Miller (Harvard University)

Abstract: In many applications, there are natural statistical models with interpretable parameters that provide insight into questions of interest. While useful, these models are almost always wrong in the sense that they only approximate the true data generating process. In some cases, it is important to account for this model error when quantifying uncertainty in the parameters. We propose to model the distribution of the observed data as a perturbation of an idealized model of interest by using a nonparametric…

Find out more »## Inferring the Evolutionary History of Tumors

Simon Tavaré (Columbia University)

Abstract: Bulk sequencing of tumor DNA is a popular strategy for uncovering information about the spectrum of mutations arising in the tumor, and is often supplemented by multi-region sequencing, which provides a view of tumor heterogeneity. The statistical issues arise from the fact that bulk sequencing makes the determination of sub-clonal frequencies, and other quantities of interest, difficult. In this talk I will discuss this problem, beginning with its setting in population genetics. The data provide an estimate of the…

Find out more »## The Statistical Finite Element Method

Mark Girolami, University of Cambridge

Abstract: The finite element method (FEM) is one of the great triumphs of modern day applied mathematics, numerical analysis and software development. Every area of the sciences and engineering has been positively impacted by the ability to model and study complex physical and natural systems described by systems of partial differential equations (PDE) via the FEM . In parallel the recent developments in sensor, measurement, and signalling technologies enables the phenomenological study of systems as diverse as protein signalling in the…

Find out more »## February 2020

## Gaussian Differential Privacy, with Applications to Deep Learning

Weijie Su (University of Pennsylvania)

Abstract: Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on the Renyi divergences. We propose an alternative relaxation we term "f-DP", which has a number of nice properties and avoids some of the difficulties associated with divergence based relaxations. First, f-DP preserves…

Find out more »## Diffusion K-means Clustering on Manifolds: provable exact recovery via semidefinite relaxations

Xiaohui Chen (University of Illinois at Urbana-Champaign)

Abstract: We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. Thus the diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given…

Find out more »## Predictive Inference with the Jackknife+

Rina Foygel Barber (University of Chicago)

Abstract: We introduce the jackknife+, a novel method for constructing predictive confidence intervals that is robust to the distribution of the data. The jackknife+ modifies the well-known jackknife (leaveoneout cross-validation) to account for the variability in the fitted regression function when we subsample the training data. Assuming exchangeable training samples, we prove that the jackknife+ permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically (in contrast, such guarantees…

Find out more »## Tales of Random Projections

Kavita Ramanan (Brown University)

Abstract: Properties of random projections of high-dimensional probability measures are of interest in a variety of fields, including asymptotic convex geometry, and potential applications to high-dimensional statistics and data analysis. A particular question of interest is to identify what properties of the high-dimensional measure are captured by its lower-dimensional projections. While fluctuations of these projections have been well studied over the past decade, we describe more recent work on the tail behavior of such projections, and various implications. This talk is based on…

Find out more »## March 2020

## Does Revolution Work? Evidence from Nepal

Rohini Pande (Yale University)

The last half century has seen the adoption of democratic institutions in much of the developing world. However, the conditions under which de jure democratization leads to the representation of historically disadvantaged groups remains debated as do the implications of descriptive representation for policy inclusion. Using detailed administrative and survey data from Nepal, we examine political selection in a new democracy, the implications for policy inclusion and the role of conflict in affecting political transformation. I situate these findings in the context…

Find out more »## April 2020

## [POSTPONED] Guido Imbens – The Applied Econometrics Professor and Professor of Economics, Graduate School of Business, Stanford University

*Please note: this event has been POSTPONED until Fall 2020* See MIT’s COVID-19 policies for more details. About the author: Prof. Guido Imbens’ primary field of interest is Econometrics. Research topics in which he is interested include: causality, program evaluation, identification, Bayesian methods, semi-parametric methods, instrumental variables. Guido Imbens does research in econometrics and statistics. His research focuses on developing methods for drawing causal inferences in observational studies, using matching, instrumental variables, and regression discontinuity designs. Guido Imbens is Professor…

Find out more »## [POSTPONED] The Blessings of Multiple Causes

David Blei (Columbia University)

*Please note: this event has been POSTPONED until Fall 2020* See MIT’s COVID-19 policies for more details. Title: The Blessings of Multiple Causes Abstract: Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods require that we observe all confounders, variables that affect both the causal variables and the outcome variables. But whether we have observed all confounders is a famously untestable assumption. We describe the deconfounder, a way to do causal…

Find out more »## September 2021

## Designing Equitable Algorithms for Criminal Justice and Beyond

Sharad Goel (Harvard University)

Please join us on Tuesday, September 14, 2021 at 4:00pm for the Distinguished Speaker Seminar with Sharad Goel (Harvard University).

Find out more »## Interpolation and learning with scale dependent kernels

Lorenzo Rosasco (MIT/Universita' di Genova)

Title: Interpolation and learning with scale dependent kernels Abstract: We study the learning properties of nonparametric ridge-less least squares. In particular, we consider the common case of estimators defined by scale dependent (Matern) kernels, and focus on the role scale and smoothness. These estimators interpolate the data and the scale can be shown to control their stability to noise and sampling. Larger scales, corresponding to smoother functions, improve stability with respect to sampling. However, smaller scales, corresponding to more complex functions,…

Find out more »## Representation and generalization

Boaz Barak (Harvard University)

Abstract: Self-supervised learning is an increasingly popular approach for learning representations of data that can be used for downstream representation tasks. A practical advantage of self-supervised learning is that it can be used on unlabeled data. However, even when labels are available, self-supervised learning can be competitive with the more "traditional" approach of supervised learning. In this talk we consider "self supervised + simple classifier (SSS)" algorithms, which are obtained by first learning a self-supervised classifier on data, and then…

Find out more »## October 2021

## Causal Matrix Completion

Devavrat Shah (MIT)

Abstract: Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are “missing completely atrandom” (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of “latent confounders”, i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. In general,…

Find out more »## Recent results in planted assignment problems

Yihong Wu (Yale University)

Abstract: Motivated by applications such as particle tracking, network de-anonymization, and computer vision, a recent thread of research is devoted to statistical models of assignment problems, in which the data are random weight graphs correlated with the latent permutation. In contrast to problems such as planted clique or stochastic block model, the major difference here is the lack of low-rank structures, which brings forth new challenges in both statistical analysis and algorithm design. In the first half of the talk,…

Find out more »## Breaking the Sample Size Barrier in Reinforcement Learning

Yuting Wei, Wharton School at UPenn

Abstract: Reinforcement learning (RL), which is frequently modeled as sequential learning and decision making in the face of uncertainty, is garnering growing interest in recent years due to its remarkable success in practice. In contemporary RL applications, it is increasingly more common to encounter environments with prohibitively large state and action space, thus imposing stringent requirements on the sample efficiency of the RL algorithms in use. Despite the empirical success, however, the theoretical underpinnings for many popular RL algorithms remain…

Find out more »## Instance Dependent PAC Bounds for Bandits and Reinforcement Learning

Kevin Jamieson (University of Washington)

Abstract: The sample complexity of an interactive learning problem, such as multi-armed bandits or reinforcement learning, is the number of interactions with nature required to output an answer (e.g., a recommended arm or policy) that is approximately close to optimal with high probability. While minimax guarantees can be useful rules of thumb to gauge the difficulty of a problem class, algorithms optimized for this worst-case metric often fail to adapt to “easy” instances where fewer samples suffice. In this talk, I…

Find out more »## Revealing the simplicity of high-dimensional objects via pathwise analysis

Ronen Eldan (Weizmann Inst. of Science and Princeton)

Abstract: One of the main reasons behind the success of high-dimensional statistics and modern machine learning in taming the curse of dimensionality is that many classes of high-dimensional distributions are surprisingly well-behaved and, when viewed correctly, exhibit a simple structure. This emergent simplicity is in the center of the theory of "high-dimensional phenomena", and is manifested in principles such as "Gaussian-like behavior" (objects of interest often inherit the properties of the Gaussian measure), "dimension-free behavior" (expressed in inequalities which do…

Find out more »## November 2021

## Designing AI for Racial Equity: Translating Ethics into Practice

S. Craig Watkins (MLK Visiting Professor - MIT)

Please join us on Monday, November 1, 2021 at 4:00pm for the Distinguished Speaker Seminar with S. Craig Watkins (MLK Visiting Professor).

Find out more »## Asymptotics of learning on dependent and structured random objects

Morgane Austern (Harvard University)

Abstract: Classical statistical inference relies on numerous tools from probability theory to study the properties of estimators. However, these same tools are often inadequate to study modern machine problems that frequently involve structured data (e.g networks) or complicated dependence structures (e.g dependent random matrices). In this talk, we extend universal limit theorems beyond the classical setting. Firstly, we consider distributionally "structured" and dependent random object i.e random objects whose distribution are invariant under the action of an amenable group. We…

Find out more »## Characterizing the Type 1-Type 2 Error Trade-off for SLOPE

Cynthia Rush (Columbia University)

Abstract: Sorted L1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this talk, we study how this relatively new regularization technique improves variable selection by characterizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion (TPP) or, equivalently, between measures of type I and type II error. Additionally, we show that on any problem instance, SLOPE with a certain regularization sequence outperforms…

Find out more »## Precise high-dimensional asymptotics for AdaBoost via max-margins & min-norm interpolants

Pragya Sur (Harvard University)

Abstract: This talk will introduce a precise high-dimensional asymptotic theory for AdaBoost on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is asymptotically separable. Under a class of statistical models, we will provide an (asymptotically) exact analysis of the max-min-L1-margin and the min-L1-norm interpolant. In turn, this will…

Find out more »## December 2021

## The Geometry of Particle Collisions: Hidden in Plain Sight

Jesse Thaler (MIT)

Abstract: Since the 1960s, particle physicists have developed a variety of data analysis strategies for the goal of comparing experimental measurements to theoretical predictions. Despite their numerous successes, these techniques can seem esoteric and ad hoc, even to practitioners in the field. In this talk, I explain how many particle physics analysis tools have a natural geometric interpretation in an emergent "space" of collider events induced by the Wasserstein metric. This in turn suggests new analysis strategies to interpret generic…

Find out more »## The Optimality of Coarse Menues

Dirk Bergemann (Yale University)

Please join us on Monday, December 6, 2021 at 4:00pm for the Distinguished Speaker Seminar with Dirk Bergemann (Yale University).

Find out more »