Asymptotics and concentration for sample covariance

Vladimir Koltchinskii (Georgia Tech)

We will discuss recent moment bounds and concentration inequalities for sample covariance operators based on a sample of n i.i.d. Gaussian random variables taking values in an infinite dimensional space. These bounds show that the size of the operator norm of the deviation of sample covariance from the true covariance can be completely characterized by two parameters: the operator norm of the true covariance and its so called "effective rank". These results rely on Talagrand's generic chaining bounds and on…

A Geometric Approach to Weakly Identified Econometric Models

Anna Mikusheva (MIT Economics)

Many nonlinear Econometric models show evidence of weak identification. In this paper we consider minimum distance statistics and show that in a broad class of models the problem of testing under weak identification is closely related to the problem of testing a curved null in a finite-sample Gaussian model. Using the curvature of the model, we develop new finite-sample bounds on the distribution of minimum-distance statistics, which we show can be used to detect weak identification and to construct tests…

High Dimensional Covariance Matrix Estimations and Factor Models

Yuan Liao (University of Maryland)

Large covariance matrix estimation is crucial for high-dimensional statistical inferences, and has also played an central role in factor analysis. Applications are found in analyzing financial risks, climate data, genomic data and PCA, etc. Commonly used approaches to estimating large covariances include shrinkages and sparse modeling. This talk will present new theoretical results on estimating large (inverse) covariance matrices under large N large T asymptotics, with a focus on the roles it plays in statistical inferences for large panel data…

Beyond Berry Esseen: Structure and Learning of Sums of Random Variables

Constantinos Daskalakis (MIT EECS)

The celebrated Berry-Esseen theorem, and its variants, provide a useful approximation to the sum of independent random variables by a Gaussian. In this talk, I will restrict attention to the important case of sums of integer random variables, arguing that Berry-Esseen theorems fall short from characterizing their general structure. I will offer stronger finitary central limit theorems, tightly characterizing the structure of these distributions, and show their implications to learning. In particular, I will present algorithms that can learn sums…

Optimal stochastic transport

Alfred Galichon (Sciences Po, Paris)

We explore the link between the Monge-Kantorovich problem and the Skorohod embedding problem. This question arises in particular in Mathematical Finance when seeking model-free bounds on some option prices when the marginal distributions of the underlying at various maturities are implied by European options prices. We provide a stochastic control approach which we connect to several important constructions. Finally we revisit in this light the celebrated Azéma-Yor solution of the Skorohod embedding problem. This talk is based on joint works…

Sparse Canonical Correlation Analysis: Minimaxity and Adaptivity

Harrison Huibin Zhou (Yale University)

Canonical correlation analysis is a widely used multivariate statistical technique for exploring the relation between two sets of variables. In this talk we consider the problem of estimating the leading canonical correlation directions in high dimensional settings. Recently, under the assumption that the leading canonical correlation directions are sparse, various procedures have been proposed for many high dimensional applications involving massive data sets. However, there has been few theoretical justification available in the literature. In this talk, we establish rate-optimal…

Linear Regression with Many Included Covariates

Whitney Newey (MIT Economics)

We consider asymptotic inference for linear regression coefficients when the number of included covariates grows as fast as the sample size. We find a limiting normal distribution with asymptotic variance that is larger than the usual one. We also find that all of the usual versions of heteroskedasticity consistent standard error estimators are inconsistent under this asymptotics. The problem with these standard errors is that they do not make a correct "degrees of freedom" adjustment. We propose a new heteroskedasticity…

Central Limit Theorems and Bootstrap in High Dimensions

Denis Chetverikov (UCLA)

We derive central limit and bootstrap theorems for probabilities that centered high-dimensional vector sums hit rectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for the probabilities Pr(n−1/2∑ni=1Xi∈A) where X1,…,Xn are independent random vectors in ℝp and A is a rectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if p=pn→∞ and p≫n; in particular, p can be as large as O(eCnc) for some constants c,C>0. The result…

Random polytopes and estimation of convex bodies

Victor-Emmanuel Brunel (Yale)

In this talk we discuss properties of random polytopes. In particular, we study the convex hull of i.i.d. random points, whose law is supported on a convex body. We propose deviation and moment inequalities for this random polytope, and then discuss its optimality, when it is seen as an estimator of the support of the probability measure, which may be unknown. We also define a notion of multidimensional quantile sets for probability measures in a Euclidean space. These are convex…

The exact k-SAT threshold for large k

Nike Sun (MSR New England and MIT Mathematics)

We establish the random k-SAT threshold conjecture for all k exceeding an absolute constant k0. That is, there is a single critical value α∗(k) such that a random k-SAT formula at clause-to-variable ratio α is with high probability satisfiable for αα∗(k). The threshold α∗(k) matches the explicit prediction derived by statistical physicists on the basis of the one-step replica symmetry breaking (1RSB) heuristic. In the talk I will describe the main obstacles in computing the threshold, and explain how they…

