Information Theory for High Throughput Sequencing

Big Data Series

Professor David Tse
Professor, U.C. Berkeley
Given on: Feb. 13th, 2014

Abstract

Extraordinary advances in sequencing technology in the past decade have revolutionized biology and medicine. Many high-throughput sequencing based assays have been designed to make various biological measurements of interest. A key computational problem is that of assembly: how to reconstruct from the many millions of short reads the underlying biological sequence of interest, be it a DNA sequence or a set of RNA transcripts? Traditionally, assembler design is viewed mainly as a software engineering project, where time and memory requirements are primary concerns while the assembly algorithms themselves are designed based on heuristic considerations with no optimality guarantee. In this talk, we outline an alternative approach to assembly design based on information theoretic principles. Starting with the question of when there is enough information in the reads to reconstruct, we design near-optimal assembly algorithms that can reconstruct with minimal amount of read information. We illustrate our approach in two settings: DNA sequencing and RNA sequencing. We report preliminary results from ShannonDNA, a DNA assembler, and ShannonRNA, a RNA assembler, and compare their performance both with the fundamental limits and with state-of-the-art software in the field.

Biography

David Tse received the B.A.Sc. degree in systems design engineering from University of Waterloo in 1989, and the M.S. and Ph.D. degrees in electrical engineering from Massachusetts Institute of Technology in 1991 and 1994 respectively. From 1994 to 1995, he was a postdoctoral member of technical staff at A.T. & T. Bell Laboratories. Since 1995, he has been at the Department of Electrical Engineering and Computer Sciences in the University of California at Berkeley, where he is currently a Professor. He received a 1967 NSERC graduate fellowship from the government of Canada in 1989, a NSF CAREER award in 1998, the Best Paper Awards at the Infocom 1998 and Infocom 2001 conferences, the Erlang Prize in 2000 from the INFORMS Applied Probability Society, the IEEE Communications and Information Theory Society Joint Paper Awards in 2001 and 2013, the Information Theory Society Paper Award in 2003, the 2009 Frederick Emmons Terman Award from the American Society for Engineering Education, a Gilbreth Lectureship from the National Academy of Engineering in 2012, the Signal Processing Society Best Paper Award in 2012 and the Stephen O. Rice Paper Award in 2013. He is a coauthor, with Pramod Viswanath, of the text “Fundamentals of Wireless Communication”, which has been used in over 60 institutions around the world.