Loading Events
  • This event has passed.
Stochastics and Statistics Seminar

Information theory of DNA sequencing

April 20, 2012 @ 11:00 am

Guy Bresler (University of California, Berkeley)

DNA sequencing is the basic workhorse of modern biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original DNA sequence. Despite major effort by the research community, the DNA sequencing problem remains unresolved from a practical perspective: it is currently not known how to produce a good assembly of most read data-sets.

By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly assuming an IID model for the DNA sequence and a noiseless read process. These basic results are then partially extended to arbitrary DNA sequences: the capacity is determined by a simple sufficient statistic of the sequence which can be computed for actual genomes.

The talk is based on joint works with Ma’ayan Bresler, Abolfazl Motahari, and David Tse.

Speaker Bio: Guy Bresler is currently a PhD candidate in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Prior to that, he received the B.S. degree in electrical and computer engineering and the M.S. degree in mathematics from the University of Illinois at Urbana-Champaign, both in 2006. Guy is the recipient of an NSF Graduate Research Fellowship, a Vodafone Graduate Fellowship, the Barry M. Goldwater Scholarship, a Vodafone Undergraduate Scholarship, the E.C. Jordan Award from the ECE department at UIUC, and the Roberto Padovani Scholarship from Qualcomm. His research interests include information theory, applied probability, and computational biology.

MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307