Views Navigation

Event Views Navigation

Inference in High Dimensions for (Mixed) Generalized Linear Models: the Linear, the Spectral and the Approximate

Marco Mondelli, Institute of Science and Technology Austria
E18-304

Abstract: In a generalized linear model (GLM), the goal is to estimate a d-dimensional signal x from an n-dimensional observation of the form f(Ax, w), where A is a design matrix and w is a noise vector. Well-known examples of GLMs include linear regression, phase retrieval, 1-bit compressed sensing, and logistic regression. We focus on the high-dimensional setting in which both the number of measurements n and the signal dimension d diverge, with their ratio tending to a fixed constant.…

Find out more »

Structural Deep Learning in Financial Asset Pricing

Jianqing Fan, Princeton University
E18-304

Abstract: We develop new financial economics theory guided structural nonparametric methods for estimating conditional asset pricing models using deep neural networks, by employing time-varying conditional information on alphas and betas carried by firm-specific characteristics. Contrary to many applications of neural networks in economics, we can open the “black box” of machine learning predictions by incorporating financial economics theory into the learning, and provide an economic interpretation of the successful predictions obtained from neural networks,  by decomposing the neural predictors as…

Find out more »

Distance-based summaries and modeling of evolutionary trees

Julia Palacios, Stanford University
E18-304

Abstract:  Phylogenetic trees are mathematical objects of great importance used to model hierarchical data and evolutionary relationships with applications in many fields including evolutionary biology and genetic epidemiology. Bayesian phylogenetic inference usually explore the posterior distribution of trees via Markov Chain Monte Carlo methods, however assessing uncertainty and summarizing distributions remains challenging for these types of structures. In this talk I will first introduce a distance metric on the space of unlabeled ranked tree shapes and genealogies. I will then…

Find out more »

Coding convex bodies under Gaussian noise, and the Wills functional

Jaouad Mourtada, ENSAE Paris
E18-304

Abstract: We consider the problem of sequential probability assignment in the Gaussian setting, where one aims to predict (or equivalently compress) a sequence of real-valued observations almost as well as the best Gaussian distribution with mean constrained to a general domain. First, in the case of a convex constraint set K, we express the hardness of the prediction problem (the minimax regret) in terms of the intrinsic volumes of K. We then establish a comparison inequality for the minimax regret…

Find out more »

Inference for Longitudinal Data After Adaptive Sampling

Susan Murphy, Harvard University
E18-304

Abstract: Adaptive sampling methods, such as reinforcement learning (RL) and bandit algorithms, are increasingly used for the real-time personalization of interventions in digital applications like mobile health and education. As a result, there is a need to be able to use the resulting adaptively collected user data to address a variety of inferential questions, including questions about time-varying causal effects. However, current methods for statistical inference on such data (a) make strong assumptions regarding the environment dynamics, e.g., assume the…

Find out more »

Generative Models, Normalizing Flows, and Monte Carlo Samplers

Eric Vanden-Eijnden, New York University
E18-304

Abstract: Contemporary generative models used in the context of unsupervised learning have primarily been designed around the construction of a map between two probability distributions that transform samples from the first into samples from the second. Advances in this domain have been governed by the introduction of algorithms or inductive biases that make learning this map, and the Jacobian of the associated change of variables, more tractable. The challenge is to choose what structure to impose on the transport to…

Find out more »

On the statistical cost of score matching

Andrej Risteski, Carnegie Mellon University
E18-304

Abstract: Energy-based models are a recent class of probabilistic generative models wherein the distribution being learned is parametrized up to a constant of proportionality (i.e. a partition function). Fitting such models using maximum likelihood (i.e. finding the parameters which maximize the probability of the observed data) is computationally challenging, as evaluating the partition function involves a high dimensional integral. Thus, newer incarnations of this paradigm instead train other losses which obviate the need to evaluate partition functions. Prominent examples include score matching (in which we fit…

Find out more »

Spectral pseudorandomness and the clique number of the Paley graph

Dmitriy (Tim) Kunisky, Yale University
E18-304

Abstract: The Paley graph is a classical number-theoretic construction of a graph that is believed to behave "pseudorandomly" in many regards. Accurately bounding the clique number of the Paley graph is a long-standing open problem in number theory, with applications to several other questions about the statistics of finite fields. I will present recent results studying the application of convex optimization and spectral graph theory to this problem, which involve understanding the extent to which the Paley graph is "spectrally…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764