Why Aren’t Network Statistics Accompanied By Uncertainty Statements?
March 1 @ 11:00 am - 12:00 pm
Eric Kolaczyk (Boston University)
Over 500K scientific articles have been published since 1999 with the word “network” in the title. And the vast majority of these report network summary statistics of one type or another. However, these numbers are rarely accompanied by any quantification of uncertainty. Yet any error inherent in the measurements underlying the construction of the network, or in the network construction procedure itself, necessarily must propagate to any summary statistics reported. Perhaps surprisingly, there is little in the way of formal statistical methodology for this problem. I summarize results from our recent work, for the case of estimating the density of low-order subgraphs. Under a simple model of network error, we show that consistent estimation of such densities is impossible when the rates of error are unknown and only a single network is observed. We then develop method-of-moment estimators of subgraph density and error rates for the case where a minimal number of network replicates are available (i.e., just 2 or 3). These estimators are shown to be asymptotically normal as the number of vertices increases to infinity. We also provide confidence intervals for quantifying the uncertainty in these estimates, implemented through a novel bootstrap algorithm. We illustrate the use of our estimators in the context of gene coexpression networks — the correction for measurement error is found to have potentially substantial impact on standard summary statistics. This is joint work with Qiwei Yao and Jinyuan Chang.
Eric Kolaczyk is a Professor of Statistics and Director of the Program in Statistics in the Department of Mathematics & Statistics at Boston University. He is also a university Data Science Faculty Fellow, and affiliated with the Division of Systems Engineering and the Programs in Bioinformatics and in Computational Neuroscience. His current research interests revolve mainly around the statistical analysis of network-indexed data, including both theory/methods development and collaborative research. He has published several books on the topic of network analysis. As an associate editor, he has served on the boards of JASA and JRSS-B in statistics, IEEE IP and TNSE in engineering, and SIMODS in mathematics. Currently he is the co-chair of the NAS Roundtable on Data Science Education. He is an elected fellow of the AAAS, ASA, and IMS, an elected senior member of the IEEE, and an elected member of the ISI.
MIT Statistics and Data Science Center host guest lecturers from around the world in this weekly seminar.