Events

IDSS Distinguished Seminars

IDSS Distinguished Speaker Seminar
On March 6, 2023 at 4:00 pm till 5:00 pm
Robert Hampshire, University of Michigan
E18-304

IDSS Distinguished Speaker Seminar

Find out more »: IDSS Distinguished Speaker Seminar
IDSS Distinguished Speaker Seminar with Eliana La Ferrara (Bocconi University)
On May 4, 2021 at 3:00 pm till 4:00 pm
Eliana La Ferrara (Bocconi University)
online

About the speaker: Eliana La Ferrara is the Invernizzi Chair in Development Economics at Bocconi University, Milan, where she also directs the Laboratory for Effective Anti-poverty Policies (LEAP). Eliana received a PhD in Economics from Harvard University in 1999. She was president of the Bureau for Research and Economic Analysis of Development (BREAD) in 2016-2019 and president of the European Economic Association in 2018. She is a Fellow of the Econometric Society and of CEPR, EUDN and IGIER, a J-PAL Affiliate and a Member of the American Academy of Arts and Sciences. Her research focuses on Development Economics and Political Economics, particularly on the role of social factors in economic development. She has studied ethnic diversity, kin structure and social norms, and the effects of television on social outcomes. She has also investigated political constraints to development, with particular focus on violent conflict in Africa. Her work has been published in the American Economic Review, Quarterly Journal of Economics, Journal of Development Economics, American Economic Journal: Applied Economics and the Journal of the European Economic Association.

Zoom meeting ID: TBD

Join Zoom meeting: TBD

YouTube livestream: TBD

Find out more »: IDSS Distinguished Speaker Seminar with Eliana La Ferrara (Bocconi University)
IDSS Distinguished Speaker Seminar with Brigitte Madrian (Brigham Young University)
On March 2, 2021 at 3:00 pm till 4:00 pm
Brigitte Madrian (Brigham Young University)
online
Find out more »: IDSS Distinguished Speaker Seminar with Brigitte Madrian (Brigham Young University)
James-Stein for eigenvectors: reducing the optimization bias in Markowitz portfolios
On April 3, 2023 at 4:00 pm till 5:00 pm
Lisa Goldberg, UC Berkeley

Abstract:

We identify and reduce bias in the leading sample eigenvector of a high-dimensional covariance matrix of correlated variables. Our analysis illuminates how error in an estimated covariance matrix corrupts optimization. It may be applicable in finance, machine learning and genomics.

Biography:

Lisa Goldberg is Head of Research at Aperio and Managing Director at BlackRock. She is Professor of the Practice of Economics at University of California, Berkeley, where she co-directs the Center for Data Analysis in Risk, an industry partnership that supports research at the intersection of financial economics and data science. Lisa is a mathematician whose research has touched topology, dynamical systems, quantitative finance, sports analytics, personalized investing and high-dimensional statistics. She is the co-author of Portfolio Risk Management, published by Princeton University Press in 2010, and inventor on five patents. Lisa is on track to complete a swim around the equator, roughly 40,000km, in 2036.

Find out more »: James-Stein for eigenvectors: reducing the optimization bias in Markowitz portfolios
Inference for Longitudinal Data After Adaptive Sampling
On December 5, 2022 at 4:00 pm till 5:00 pm
Susan Murphy, Harvard University
E18-304

Abstract: Adaptive sampling methods, such as reinforcement learning (RL) and bandit algorithms, are increasingly used for the real-time personalization of interventions in digital applications like mobile health and education. As a result, there is a need to be able to use the resulting adaptively collected user data to address a variety of inferential questions, including questions about time-varying causal effects. However, current methods for statistical inference on such data (a) make strong assumptions regarding the environment dynamics, e.g., assume the longitudinal data follows a Markovian process, or (b) require data to be collected with one adaptive sampling algorithm per user, which excludes algorithms that learn to select actions using data collected from multiple users. These are major obstacles preventing the use of adaptive sampling algorithms more widely in practice. In this work, we proved statistical inference for the common Z-estimator based on adaptively sampled data. The inference is valid even when observations are non-stationary and highly dependent over time, and (b) allow the online adaptive sampling algorithm to learn using the data of all users. Furthermore, our inference method is robust to miss-specification of the reward models used by the adaptive sampling algorithm. This work is motivated by our work in designing the Oralytics oral health clinical trial in which an RL algorithm will be used to select treatments, yet valid statistical inference is essential for conducting primary data analyses after the trial is over.

Bio:
Susan Murphy is the Mallinckrodt Professor of Statistics and of Computer Science, and the Radcliffe Alumnae Professor at the Radcliffe Institute at Harvard University. Her research focuses on improving sequential, individualized, decision making in digital health. She developed the micro-randomized trial for use in constructing digital health interventions; this trial design is in use across a broad range of health-related areas. Her lab works on online learning algorithms for developing personalized digital health interventions. Dr. Murphy is a member of the National Academy of Sciences and of the National Academy of Medicine, both of the US National Academies. In 2013 she was awarded a MacArthur Fellowship for her work on experimental designs to inform sequential decision making. She is a Fellow of the College on Problems in Drug Dependence, Past-President of Institute of Mathematical Statistics, Past-President of the Bernoulli Society and a former editor of the Annals of Statistics.

Find out more »: Inference for Longitudinal Data After Adaptive Sampling
Structural Deep Learning in Financial Asset Pricing
On November 8, 2022 at 4:00 pm till 5:00 pm
Jianqing Fan, Princeton University
E18-304

Abstract: We develop new financial economics theory guided structural nonparametric methods for estimating conditional asset pricing models using deep neural networks, by employing time-varying conditional information on alphas and betas carried by firm-specific characteristics. Contrary to many applications of neural networks in economics, we can open the “black box” of machine learning predictions by incorporating financial economics theory into the learning, and provide an economic interpretation of the successful predictions obtained from neural networks, by decomposing the neural predictors as risk-related and mispricing components. Our estimation method starts with period-by-period cross-sectional deep learning, followed by local PCAs to capture time-varying features such as latent factors of the model. We formally establish the asymptotic theory of the structural deep-learning estimators, which apply to both in-sample fit and out-of-sample predictions. We also illustrate the “double-descent-risk” phenomena associated with over-parametrized predictions, which justifies the use of over-fitting machine learning methods. (Joint with Tracy Ke, Yuan Liao, and Andreas Neuhierl )

About the speaker: Jianqing Fan is a statistician, financial econometrician, and data scientist. He is Frederick L. Moore ’18 Professor of Finance, Professor of Statistics, and Professor of Operations Research and Financial Engineering at the Princeton University where he chaired the department from 2012 to 2015. He is the winner of the 2000 COPSS Presidents’ Award, Morningside Gold Medal for Applied Mathematics (2007), Guggenheim Fellow (2009), Pao-Lu Hsu Prize (2013) and Guy Medal in Silver (2014). He was elected to Academician from Academia Sinica in 2012.

Fan is interested in statistical theory and methods in data science, statistical machine learning , finance, economics, computational biology, biostatistics with particular skills on high-dimensional statistics, nonparametric modeling, longitudinal and functional data analysis, nonlinear, survival analysis, time series, wavelets , among others.

Find out more »: Structural Deep Learning in Financial Asset Pricing
Democracy and the Pursuit of Randomness
On September 19, 2022 at 4:00 pm till 5:00 pm
Ariel Procaccia, Harvard University
E18-304

Abstract: Sortition is a storied paradigm of democracy built on the idea of choosing representatives through lotteries instead of elections. In recent years this idea has found renewed popularity in the form of citizens’ assemblies, which bring together randomly selected people from all walks of life to discuss key questions and deliver policy recommendations. A principled approach to sortition, however, must resolve the tension between two competing requirements: that the demographic composition of citizens’ assemblies reflect the general population and that every person be given a fair chance (literally) to participate. I will describe our work on designing, analyzing and implementing randomized participant selection algorithms that balance these two requirements. I will also discuss practical challenges in sortition based on experience with the adoption and deployment of our open-source system, Panelot.

Bio: Ariel Procaccia is Gordon McKay Professor of Computer Science at Harvard University. He works on a broad and dynamic set of problems related to AI, algorithms, economics, and society. To translate his research into practice, he has helped create systems and platforms that are regularly used to solve everyday fair division problems, resettle refugees, and select citizens’ assemblies. His distinctions include the Social Choice and Welfare Prize (2020), a Guggenheim Fellowship (2018), the IJCAI Computers and Thought Award (2015), and a Sloan Research Fellowship (2015).

Find out more »: Democracy and the Pursuit of Randomness
Causal Inference and Data Fusion
On May 2, 2022 at 4:00 pm till 5:00 pm
Elias Bareinboim (Columbia University)
E18-304

Abstract: Causal inference is usually dichotomized into two categories, experimental (Fisher, Cox, Cochran) and observational (Neyman, Rubin, Robins, Dawid, Pearl) which, by and large, are studied separately. Understanding reality is more demanding. Experimental and observational studies are but two extremes of a rich spectrum of research designs that generate the bulk of the data available in practical, large-scale situations. In typical medical explorations, for example, data from multiple observations and experiments are collected, coming from distinct experimental setups, different sampling conditions, and heterogeneous populations.

In this talk, I will introduce the data-fusion problem, which is concerned with piecing together multiple datasets collected under heterogeneous conditions (to be defined) so as to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to causal analysts since the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. I will present my work on a general, non-parametric framework for handling these biases and, ultimately, a theoretical solution to the problem of fusion in causal inference tasks.

Suggested readings:

[1] E. Bareinboim and J. Pearl. Causal inference and the Data-Fusion Problem. Proceedings of the National Academy of Sciences, 113(27): 7345-7352, 2016. https://www.pnas.org/content/113/27/7345

[2] E. Bareinboim, J. Correa, D. Ibeling, T. Icard. On Pearl’s Hierarchy and the Foundations of Causal Inference. In “Probabilistic and Causal Inference: The Works of Judea Pearl”, In Probabilistic and Causal Inference: The Works of Judea Pearl (ACM, Special Turing Series), pp. 507-556, 2022. https://causalai.net/r60.pdf

[3] K. Xia, K. Lee, Y. Bengio, E. Bareinboim. The Causal-Neural Connection: Expressiveness, Learnability, and Inference In Proceedings of the 35th Annual Conference on Neural Information Processing Systems (NeurIPS), 2021. https://causalai.net/r80.pdf

About the speaker: Elias Bareinboim is an associate professor in the Department of Computer Science and the director of the Causal Artificial Intelligence (CausalAI) Laboratory at Columbia University. His research focuses on causal and counterfactual inference and their applications to data-driven fields in the health and social sciences as well as artificial intelligence and machine learning. His work was the first to propose a general solution to the problem of “data-fusion,” providing practical methods for combining datasets generated under different experimental conditions and plagued with various biases. More recently, Bareinboim has been exploring the intersection of causal inference with decision-making (including reinforcement learning) and explainability (including fairness analysis). Bareinboim received his Ph.D. from the University of California, Los Angeles, where he was advised by Judea Pearl. Bareinboim was named one of ” 10 to Watch” by IEEE, and is a recipient of the NSF CAREER Award, the ONR Young Investigator Award, the Dan David Prize Scholarship, the 2014 AAAI Outstanding Paper Award, and the 2019 UAI Best Paper Award.

Find out more »: Causal Inference and Data Fusion
Multiple Randomization Designs
On March 4, 2022 at 2:30 pm till 3:30 pm
Guido Imbens (Stanford University)
E18-304

Title: Multiple Randomization Designs

with Patrick Bajari, Brian Burdick, Lorenzo Masoero, James McQueen, Thomas Richardson, and Ido M. Rosen

Abstract: In this study we introduce a new class of experimental designs. In a classical randomized controlled trial (RCT), or A/B test, a randomly selected subset of a population of units (e.g., individuals, plots of land, or experiences) is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by treatment group is an estimate of the average effect of the treatment. However, motivating our study, the setting for modern experiments is often different, with the outcomes and treatment assignments indexed by multiple populations. For example, outcomes may be indexed by buyers and sellers, by content creators and subscribers, by drivers and riders, or by travelers and airlines and travel agents, with treatments potentially varying across these indices. Spillovers or interference can arise from interactions between units across populations. For example, sellers’ behavior may depend on buyers’ treatment assignment, or vice versa. This can invalidate the simple comparison of means as an estimator for the average effect of the treatment in classical RCTs.We propose new experiment designs for settings in which multiple populations interact. We show how these designs allow us to study questions about interference that cannot be answered by classical randomized experiments. Finally, we develop new statistical methods for analyzing these Multiple Randomization Designs.

About the speaker: Guido Imbens is The Applied Econometrics Professor at the Stanford Graduate School of Business and Professor of Economics in the Economics Department at Stanford University. Currently he is also the Amman Mineral Faculty Fellow at the GSB. He has held tenured positions at UCLA, UC Berkeley, and Harvard University before joining Stanford in 2012. Imbens specializes in econometrics, and in particular methods for drawing causal inferences from experimental and observational data. He has published extensively in the leading economics and statistics journals. Together with Donald Rubin he has published a book, “Causal Inference in Statistics, Social and Biomedical Sciences.” Guido Imbens is a fellow of the Econometric Society, the Royal Holland Society of Sciences and Humanities, the Royal Netherlands Academy of Sciences, the American Academy of Arts and Sciences, and the American Statistical Association. He holds an honorary doctorate from the University of St. Gallen. In 2017 he received the Horace Mann medal at Brown University. In 2021 he shared the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel with David Card and Joshua Angrist for “methodological contributions to the analysis of causal relationship.’’ Currently Imbens is Editor of Econometrica.

Register now: We are required to collect contact information for this event, including name, cell phone number and email address. This information is for contact tracing purposes only. Please provide your contact information here if you plan on attending the seminar.

Find out more »: Multiple Randomization Designs
Neural networks: optimization, transition to linearity and deviations therefrom
On April 4, 2022 at 4:00 pm till 5:00 pm
Mikhail Belkin (UC San Diego)
E18-304

Title: Neural networks: optimization, transition to linearity and deviations therefrom

Abstract: The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. I will first discuss some general mathematical principles allowing for efficient optimization in over-parameterized non-linear systems, a setting that includes deep neural networks. I will argue that optimization problems corresponding to these systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition on most of the parameter space, allowing for efficient optimization by gradient descent or SGD.

As a separate but related development, I will talk about the remarkable recently discovered phenomenon of transition to linearity (constancy of NTK), when networks become linear functions of their parameters as their width increases. In particular I will talk about a quite general form of the transition to linearity for a broad class of feed-forward networks corresponding to arbitrary directed graphs. It turns out that the width of such networks is characterized by the minimum in-degree of their graphs, excluding the input layer and the first layer.

Finally, I will mention a very interesting deviation from linearity, a so-called “catapult phase”, a recently identified non-linear and, furthermore, non-perturbatative phenomenon, which persists even as neural networks become increasingly linear in the limit of the increasing width.

Based on joint work with Chaoyue Liu, Libin Zhu, Adit Radhakrishnan

About the speaker: Mikhail Belkin received his Ph.D. in 2003 from the Department of Mathematics at the University of Chicago. His research interests are in theory and applications of machine learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral analysis to data science. His recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in deep learning. This empirical evidence necessitated revisiting some of the basic concepts in statistics and optimization. One of his key recent findings is the “double descent” risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation.

Mikhail Belkin is a recipient of a NSF Career Award and a number of best paper and other awards. He has served on the editorial boards of the Journal of Machine Learning Research, IEEE Pattern Analysis and Machine Intelligence and SIAM Journal on Mathematics of Data Science.

Find out more »: Neural networks: optimization, transition to linearity and deviations therefrom