An image of Big Data visualization

Nonparametric Bayesian Statistics

The promise of Big Data isn’t simply to estimate a mean with greater accuracy; rather, practitioners are interested in learning complex, hierarchical information from data sets. Bayesian statistics allows not only this flexible modeling but also a coherent treatment of model uncertainty as data accrue. Bayesian nonparametrics goes a step further by providing models whose complexity grows with the size of the data. We expect to see, e.g., a greater diversity of topics as we read more documents from a news publication, a greater diversity of image subjects as we view more photographs online, and more friend groups as we examine more individuals participating in a social network. Bayesian nonparametrics provides modeling solutions in all of these cases by replacing the finite-dimensional prior distributions of classical Bayesian analysis with infinite-dimensional stochastic processes. Novel structures and relationships in data—from clustering, to admixtures, to graphs, to phylogenetic trees—motivate the creation of new Bayesian nonparametric models. And data sets of increasing size call for new computational tools and algorithms for learning and inference in these models. Our work demonstrates how to retain the strengths of the Bayesian paradigm and infinite-dimensional, nonparametric analysis while simultaneously enabling fast, and even streaming, inference on modern, large data sets.

Tamara Broderick

ITT Career Development Assistant Professor

Selected Publications

T. Broderick, A. C. Wilson, and M. I. Jordan. Posteriors, conjugacy, and exponential families for completely random measures. Submitted.

T. Broderick, B. Kulis, and M. I. Jordan. MAD-Bayes: MAP-based asymptotic derivations from Bayes. In International Conference on Machine Learning, 2013.

T. Broderick, J. Pitman, and M. I. Jordan. Feature allocations, probability functions, and paintboxes. Bayesian Analysis, 2013.

T. Broderick, M. I. Jordan, and J. Pitman. Cluster and feature modeling from combinatorial stochastic processes. Statistical Science, 28(3):289–312, 2013.

T. Broderick, N. Boyd, A. Wibisono, A. C. Wilson, and M. I. Jordan. Streaming variational Bayes. In Neural Information Processing Systems, 2013.