Sample complexity of population recovery
Yury Polyanskiy (MIT)
September 15 @ 11:00 am
In this talk we will first consider a general question of estimating linear functional of the distribution based on the noisy samples from it. We discover that the (two-point) LeCam lower bound is in fact achievable by optimizing bias-variance tradeoff of an empirical-mean type of estimator.
Next, we apply this general framework to the specific problem of population recovery. Namely, consider a random poll of sample size n conducted on a population of individuals, where each pollee is asked to answer d binary questions. We consider one of the two polling impediments:
- in lossy population recovery, a pollee may skip each question with probability ε;
- in noisy population recovery, a pollee may lie on each question with probability ε.
Given n lossy or noisy samples, the goal is to estimate the probabilities of all 2d binary vectors simultaneously within accuracy δ with high probability. We settle the sample complexity for both models and discover curious phase-transition in the lossy model at ε = 1/2. Our results hinge on complex-analytic methods, such as Hadamard’s three-lines theorem.
Biography: Yury Polyanskiy is an Associate Professor of Electrical Engineering and Computer Science and a member of LIDS at MIT. Yury received M.S. degree in applied mathematics and physics from the Moscow Institute of Physics and Technology, Moscow, Russia in 2005 and Ph.D. degree in electrical engineering from Princeton University, Princeton, NJ in 2010. Currently, his research focuses on basic questions in information theory, error-correcting codes, wireless communication and fault tolerant and defect-tolerant circuits. Dr. Polyanskiy won the 2013 NSF CAREER award and 2011 IEEE Information Theory Society Paper Award.