Talks

What Algorithms can Transformers Learn? A Study in Length Generalization

Preetum Nakkiran – Research Scientist, Apple Research

Thu, 7-Mar-2024 / 4:00pm / Packard 202

Abstract

Large language models exhibit many surprising abilities, seemingly generalizing “out-of-distribution” to novel tasks. Yet, attempts to replicate such abilities in fully-controlled settings often yield mixed results. We conduct a careful study of out-of-distribution generalization in a restricted setting: length-generalization on algorithmic tasks. E.g.: Can a model trained to solve 10 digit addition problems generalize to 50 digit addition? For which tasks do we expect this to work, and why?

Our key tool is the recently-introduced RASP language (Weiss et al 2021), which is a programming language designed for the Transformer’s computational model. We conjecture, informally, that: Transformers tend to length-generalize on a task if there exists a short RASP program that solves the task for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks, and can also inform design of effective scratchpads. Finally, on the theoretical side, we give a simple separating example between our conjecture and the “min-degree-interpolator” model of learning from Abbe et al. (2023).

Joint work with Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, and Samy Bengio. To appear in ICLR 2024. arXiv: https://arxiv.org/abs/2310.16028

Bio

Preetum Nakkiran is a Research Scientist at Apple. His research builds conceptual tools for understanding learning systems, using both theory and experiment. He has worked on topics including generalization, interpolation, representation, and calibration. Preetum obtained his PhD in Computer Science at Harvard University, advised by Boaz Barak and Madhu Sudan. He did his postdoc at UCSD with Misha Belkin, as part of the NSF/Simons Collaboration on Deep Learning. He has also worked with Google Research and OpenAI, and is the prior recipient of the Google PhD Fellowship and the NSF GRFP. Preetum did his undergraduate work in EECS at UC Berkeley.

Talks

What Algorithms can Transformers Learn? A Study in Length Generalization

Preetum Nakkiran – Research Scientist, Apple Research

Abstract

Bio

Information Systems Laboratory (ISL)

ISL Colloquium