- This event has passed.
Automated Data Summarization for Scalability in Bayesian Inference
September 11, 2019 @ 4:00 pm - 5:00 pm
Tamara Broderick (MIT)
IDS.190 – Topics in Bayesian Modeling and Computation
Many algorithms take prohibitively long to run on modern, large datasets. But even in complex data sets, many data points may be at least partially redundant for some task of interest. So one might instead construct and use a weighted subset of the data (called a “coreset”) that is much smaller than the original dataset. Typically running algorithms on a much smaller data set will take much less computing time, but it remains to understand whether the output can be widely useful. (1) In particular, can running an analysis on a smaller coreset yield answers close to those from running on the full data set? (2) And can useful coresets be constructed automatically for new analyses, with minimal extra work from the user? We answer in the affirmative for a wide variety of problems in Bayesian inference. We demonstrate how to construct “Bayesian coresets” as an automatic, practical pre-processing step. We prove that our method provides geometric decay in relevant approximation error as a function of coreset size. Empirical analysis shows that our method reduces approximation error by orders of magnitude relative to uniform random subsampling of data. Though we focus on Bayesian methods here, we also show that our construction can be applied in other domains.
Tamara Broderick is an Associate Professor in EECS at MIT.
**Meetings are open to any interested researcher.
**Taking IDS.190 satisfies the seminar requirement for students in MIT’s Interdisciplinary Doctoral Program in Statistics (IDPS), but formal registration is open to any graduate student who can register for MIT classes. For more information and an up-to-date schedule, please see https://stellar.mit.edu/S/course/IDS/fa19/IDS.190/