Linear Regression with Many Included Covariates

Whitney Newey (MIT Economics)
E62-587

We consider asymptotic inference for linear regression coefficients when the number of included covariates grows as fast as the sample size. We find a limiting normal distribution with asymptotic variance that is larger than the usual one. We also find that all of the usual versions of heteroskedasticity consistent standard error estimators are inconsistent under this asymptotics. The problem with these standard errors is that they do not make a correct "degrees of freedom" adjustment. We propose a new heteroskedasticity…

Find out more »

Sparse Canonical Correlation Analysis: Minimaxity and Adaptivity

Harrison Huibin Zhou (Yale University)
E62-587

Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. In this talk we consider the problem of estimating the leading canonical correlation directions in high dimensional settings. Recently, under the assumption that the leading canonical correlation directions are sparse, various procedures have been proposed for many high dimensional applications involving massive data sets. However, there has been few theoretical justification available in the literature. In this talk, we establish rate-optimal…

Find out more »

Optimal stochastic transport

Alfred Galichon (Sciences Po, Paris)
E62-587

We explore the link between the Monge-Kantorovich problem and the Skorohod embedding problem. This question arises in particular in Mathematical Finance when seeking model-free bounds on some option prices when the marginal distributions of the underlying at various maturities are implied by European options prices. We provide a stochastic control approach which we connect to several important constructions. Finally we revisit in this light the celebrated Azéma-Yor solution of the Skorohod embedding problem. This talk is based on joint works…

Find out more »

Beyond Berry Esseen: Structure and Learning of Sums of Random Variables

Constantinos Daskalakis (MIT EECS)
E62-587

The celebrated Berry-Esseen theorem, and its variants, provide a useful approximation to the sum of independent random variables by a Gaussian. In this talk, I will restrict attention to the important case of sums of integer random variables, arguing that Berry-Esseen theorems fall short from characterizing their general structure. I will offer stronger finitary central limit theorems, tightly characterizing the structure of these distributions, and show their implications to learning. In particular, I will present algorithms that can learn sums…

Find out more »

High Dimensional Covariance Matrix Estimations and Factor Models

Yuan Liao (University of Maryland)
E62-587

Large covariance matrix estimation is crucial for high-dimensional statistical inferences, and has also played an central role in factor analysis. Applications are found in analyzing financial risks, climate data, genomic data and PCA, etc. Commonly used approaches to estimating large covariances include shrinkages and sparse modeling. This talk will present new theoretical results on estimating large (inverse) covariance matrices under large N large T asymptotics, with a focus on the roles it plays in statistical inferences for large panel data…

Find out more »

Uncertainty quantification and confidence sets in high-dimensional models

Richard Nickl (University of Cambridge)
E62-587

While much attention has been paid recently to the construction of optimal algorithms that adaptively estimate low-dimensional parameters (described by sparsity, low-rank, or smoothness) in high-dimensional models, the theory of statistical inference and uncertainty quantification (in particular hypothesis tests & confidence sets) is much less well-developed. We will discuss some perhaps surprising impossibility results in the basic high-dimensional compressed sensing model, and some of the recently remerging positive results in the area.

Find out more »

Clustering of sparse networks: Phase transitions and optimal algorithms

Lenka Zdeborova (CEA)
E62-587

A central problem in analyzing networks is partitioning them into modules or communities, clusters with a statistically homogeneous pattern of links to each other or to the rest of the network. A principled approach to address this problem is to fit the network on a stochastic block model, this task is, however, intractable exactly. In this talk we discuss application of belief propagation algorithm to module detection. In the first part we present an asymptotically exact analysis of the stochastic…

Find out more »

Superposition codes and approximate-message-passing decoder

Florent Krzakala (Université Pierre et Marie)
E62-587

Superposition codes are asymptotically capacity achieving scheme for the Additive White Gaussian Noise channel. I will first show how a practical iterative decoder can be built based on a Belief Propagation type approach, closely related to the one performed in compressed sensing and sparse estimation problems. Secondly, I will show how the idea of spatial coupling in this context allows to built efficient and practical capacity achieving coding and decoding schemes. The links between the present problem, sparse estimations, and…

Find out more »

Regression-Robust Designs of Controlled Experiments

Nathan Kallus (MIT)
E62-587

Achieving balance between experimental groups is a cornerstone of causal inference. Without balance any observed difference may be attributed to a difference other than the treatment alone. In controlled/clinical trials, where the experimenter controls the administration of treatment, complete randomization of subjects has been the golden standard for achieving this balance because it allows for unbiased and consistent estimation and inference in the absence of any a priori knowledge or measurements. However, since estimator variance under complete randomization may be…

Find out more »

Uniform Post Selection Inference for Z-estimation problems

Alex Belloni (Duke University)
E62-587

In this talk we will consider inference with high dimensional data. We propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest alpha_0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. We show how to apply these methods to Z-estimators (for example, logistic regression and quantile regression). These methods allow to estimate alpha_0 at the root-n rate when the total number p of other…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764