WiDS Cambridge 2023
WiDS Cambridge is a hybrid one-day technical conference will feature an all-female line up of speakers from academia and industry to talk about the latest data science-related research in a number of domains.
WiDS Cambridge is a hybrid one-day technical conference will feature an all-female line up of speakers from academia and industry to talk about the latest data science-related research in a number of domains.
Abstract: Sampling from high-dimensional probability distributions is a fundamental and challenging problem encountered throughout science and engineering. One of the most popular approaches to tackle such problems is the Markov chain Monte Carlo (MCMC) paradigm. While MCMC algorithms are often simple to implement and widely used in practice, analyzing the rate of convergence to stationarity, i.e. the "mixing time", remains a challenging problem in many settings. I will describe a new technique based on pairwise correlations called "spectral independence", which has been…
Abstract: In this talk I will propose new tools for the exploratory data analysis of data objects taking values in a general separable metric space. First, I will introduce depth profiles, where the depth profile of a point ω in the metric space refers to the distribution of the distances between ω and the data objects. I will describe how depth profiles can be harnessed to define transport ranks, which capture the centrality of each element in the metric space with respect to the…
Abstract: Reinforcement learning is the study of models and procedures for optimal sequential decision-making under uncertainty. At its heart lies the Bellman optimality operator, whose unique fixed point specifies an optimal policy and value function. In this talk, we discuss two classes of variational methods that can be used to obtain approximate solutions with accompanying error guarantees. For policy evaluation problems based on on-line data, we present Krylov-Bellman boosting, which combines ideas from Krylov methods with non-parametric boosting. For policy optimization problems based on…
Abstract: We identify and reduce bias in the leading sample eigenvector of a high-dimensional covariance matrix of correlated variables. Our analysis illuminates how error in an estimated covariance matrix corrupts optimization. It may be applicable in finance, machine learning and genomics. Biography: Lisa Goldberg is Head of Research at Aperio and Managing Director at BlackRock. She is Professor of the Practice of Economics at University of California, Berkeley, where she co-directs the Center for Data Analysis in Risk, an industry…
Abstract: Regression discontinuity design (RDD) is a quasi-experimental impact evaluation method ubiquitous in the social- and applied health sciences. It aims to estimate average treatment effects of policy interventions by exploiting jumps in outcomes induced by cut-off assignment rules. Here, we establish a correspondence between the RDD setting and free discontinuity problems, in particular the celebrated Mumford-Shah model in image segmentation. The Mumford-Shah model is non-convex and hence admits local solutions in general. We circumvent this issue by relying on…
This celebratory event reflects on the impact in research and education the Institute for Data, Systems, and Society has had since its launch in 2015 and explores future opportunities with thought leaders and policy experts. In panels and plenary talks, we will discuss the impact of research areas utilizing the available massive data, in-depth understanding of underlying social and engineering systems, and the investigation of social and institutional behavior to provide answers to critical and complex challenges. For more information,…
Abstract: This talk is based on two recent papers: 1. “On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation” and 2. “Convergence Rates of Oblique Regression Trees for Flexible Function Libraries” 1. Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of…
Abstract: Domain adaptation, transfer, multitask, meta, few-shots, representation, or lifelong learning … these are all important recent directions in ML that all touch at the core of what we might mean by ‘AI’. As these directions all concern learning in heterogeneous and ever-changing environments, they all share a central question: what information a data distribution may have about another, critically, in the context of a given estimation problem, e.g., classification, regression, bandits, etc. Our understanding of these problems is still…
Abstract: In this talk, I will argue that it is sometimes possible to learn, with techniques originated from bandits, the "hints" on which learning-augmented algorithms rely to improve worst-case performances. We will describe this phenomenon, the combination of online learning with competitive analysis, on the example of stochastic online scheduling. We shall quantify the merits of this approach by computing and comparing non-asymptotic expected competitive ratios (the standard performance measure of algorithms) Bio: Vianney Perchet is a professor at the…