ISL Colloquium
Menu Close

Negative Stepsizes Make Gradient-Descent-Ascent Converge

Jason Altschuler
Assistant Professor, University of Pennsylvania
Thursday, May 29, 2025 at 4:00 PM • Packard 202

Abstract

Solving min-max problems is a central question in optimization, games, learning, and controls. Arguably the most natural algorithm is Gradient-Descent-Ascent (GDA). However, since the 1970s, conventional wisdom has argued that GDA fails to converge even on simple problems. This failure spurred the extensive literature on modifying GDA with extragradients, optimism, momentum, anchoring, etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes.

The key innovation is the proposal of unconventional stepsize schedules that are time-varying, asymmetric, and (most surprisingly) periodically negative. We show that all three properties are necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained convex-concave problems). The core intuition is that although negative stepsizes make backward progress, they de-synchronize the min/max variables (overcoming the cycling issue of GDA) and lead to a slingshot phenomenon in which the forward progress in the other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. Joint work with Henry Shugart.

Bio

Jason Altschuler is an Assistant Professor at UPenn in the Department of Statistics and Data Science, and (by courtesy) also the Departments of Computer Science, Electrical Engineering, and Applied Mathematics. Previously, he received his undergraduate degree from Princeton and his PhD from MIT. He is the recipient of a Sloan Fellowship in Mathematics, the MIT Sprowls Dissertation Award, the Mathematical Optimization Society?s Tucker Finalist Prize, and a Wharton Teaching Excellence Award. His research interests lie at the interface of optimization, probability, and machine learning, with a focus on the design and analysis of efficient algorithms.