Stochastics and Statistics Seminar

Estimating the number of connected components of large graphs based on subgraph sampling

Speaker Name: Yihong Wu (Yale)

Date: February 24, 2017

Time: 11:00am

Location: E18-304


Learning properties of large graphs from samples is an important problem in statistical network analysis, dating back to the early work of Goodman and Frank. We revisit the problem formulated by Frank (1978) of estimating the numbers of connected components in a graph of N vertices based on the subgraph sampling model, where we observe the subgraph induced by n vertices drawn uniformly at random. The key question is whether it is possible to achieve accurate estimation, i.e., vanishing normalized mean-square error, by sampling a vanishing fraction of the vertices. We show that this is impossible if the graph contains high-degree vertices or long induced cycles; otherwise, it is possible by accessing only sublinear number of samples. Optimal sample complexity bounds are obtained for several classes of graphs including forests, cliques and chordal graphs.

The methodology relies on topological identities of graph homomorphism numbers, which, in turn, also play a key role proving minimax lower bounds based on constructing random instances of graphs with matching structures of small subgraphs. This is based on joint work with Jason Klusowski (Yale).

Speaker Bio:

Yihong Wu is an assistant professor in the Department of Statistics and Data Science at Yale University. He received his Ph.D. degree from Princeton University in 2011. He was a postdoctoral fellow with the Statistics Department in The Wharton School at the University of Pennsylvania from 2011 to 2012 and an assistant professor in the Department of ECE at the University of Illinois at Urbana-Champaign from 2013 to 2015. His research interests are in the theoretical and algorithmic aspects of high-dimensional statistics and information theory.