ISL Colloquium

← List all talks ...

Computational aspects of learning sparse functions with neural networks

Theodor Misiakiewicz – PhD student, Stanford

Thu, 6-Oct-2022 / 4:00pm / Packard 101

Abstract

Deep learning methods have become the standard approach to exploit massive high-dimensional datasets. Why do they not seem to suffer from the curse of dimensionality? Several works have shown that neural networks (NNs) can adapt to low-dimensional structure, in contrast to kernel methods, and overcome this curse when learning sparse functions (a function that depends on a latent low-dimensional subspace). However, no efficient algorithms are provided to construct these NNs. In fact, it can be shown in some cases that this problem is NP-hard. How are these results modified when we require computational efficiency? In particular, when can NNs trained by SGD beat the curse of dimensionality?

In this talk, I will consider three scenarios corresponding to learning sparse functions with 2-layer NNs in three optimization regimes: 1) lazy training regime; 2) convex NNs; and 3) online SGD in the mean-field scaling. In each of these scenarios, we provide tight characterizations for the performance of NNs. In particular, while NNs trained beyond the lazy training regime can indeed adapt to sparsity, computational aspects cannot be ignored. Understanding which sparse functions are efficiently learned by NNs reveals interesting hierarchical structures in the target function (staircase property) and rich behavior in the SGD dynamics (saddle-to-saddle).

This is based on a few joint works with Emmanuel Abbe, Enric Boix-Adsera, Michael Celentano, Hong Hu, Yue M. Lu, Song Mei and Andrea Montanari.

Bio

Theodor Misiakiewicz is a sixth year PhD student in the Statistics Department at Stanford University, advised by Andrea Montanari. He works on various topics in machine learning theory, statistics and applied probability. His current main interest is on providing a mathematical foundation for deep neural networks. In particular, he is working on their mean-field description and their connection in some regime to random feature and kernel models.