David Woodruff | ISL Colloquium

Abstract

One of the key problems in unsupervised learning is to learn a probability distribution from samples. In this talk I’ll discuss a basic and fundamental family of distributions: tree-structured graphical models. Trees are a rich family that are flexible enough to capture higher-order interactions among the variables, yet have a simple structure that make them amenable to efficient learning and inference.

I’ll present an analysis of a classical algorithm for learning tree-structured distributions due to Chow and Liu from 1968. Despite being a fundamental algorithm that is widely used in practice, tight sample complexity bounds for the Chow-Liu algorithm have remained elusive. I’ll show that the algorithm learns tree-structured distributions to within error ε in total variation distance using O(p log p / ε^2) samples, where p is the number of variables. This is the first optimal algorithm for the problem in terms of the dependence on p. Our techniques also apply to the Gaussian case, where we show that O(p log p / ε^2) samples suffice to learn a tree-structured Gaussian graphical model to within error ε in Frobenius norm.

Bio

David Woodruff is a Professor at Carnegie Mellon University. He received his PhD from MIT in 2007 and was a research scientist at IBM Almaden Research Center from 2007-2014. He has served on over 50 program committees, won several best paper awards, and is interested in sublinear algorithms for big data, machine learning, numerical linear algebra, and data streams.

A Very Sketchy Talk

Abstract

Bio