# Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams

Jelani Nelson (Harvard University)

## September 29 @ 11:00 am

### Event Navigation

Consider the following problem: we monitor a sequence of edgeinsertions and deletions in a graph on n vertices, so there are N = (n choose 2) possible edges (e.g. monitoring a stream of friend accepts/removals on Facebook). At any point someone may say “query()”, at which point must output a random edge that exists in the graph at that time from a distribution that is statistically close to uniform. More specifically, with probability p our edge should come from a distribution close to uniform, and with probability 1-p our sampler can do anything (i.e. it can say “Fail”, or output a non-existent edge). The primary goal is to design an algorithm that uses very little memory. Whereas the trivial solution maintains N bits, keeping track of which edges currently exist, we show that the optimal space complexity for this problem is Theta(min{N, log(1/p) * log^2(2 + N / log(1/p))}) bits. The upper bound follows by a very minor modification

of previous work, whereas the lower bound is our main novel contribution.

of previous work, whereas the lower bound is our main novel contribution.

This is joint work with Michael Kapralov (EPFL), Jakub Pachocki (OpenAI), Zhengyu Wang (Harvard), David P. Woodruff (IBM Almaden), and Mobin Yahyazadeh (Sharif).