Title: Optimization and Optimality in Deep Learning: An Anti-Fragile Approach
Date and time: 03 June 2025 (Tuesday), 11:30 a.m. – 12:30 p.m.
Venue: Online
MS Teams Link: Link
Abstract: Stochastic gradient descent (SGD) and Adam are two popular methods for training Deep Neural Networks (DNNs). They are, however, extremely fragile due to the informal nature of hyper-parameter tuning - a key step in setting up any training routine. Even when the stars align, there are little to no guarantees, vis-à-vis the numerical stability and optimality of DNN training. Together with Anton Linder (Ph.D. student at Karlstad University), I focused on SGD within the context of supervised and unsupervised deep learning. In this talk, I will discuss a modification to SGD that we call Stochastic Gradient Pemantle Descent (SGPD). It combines SGD with ideas from a classic paper by Robin Pemantle. I will discuss the resulting properties with respect to learning variance, numerical stability and (local) optimality. I will also try to convince you that SGPD is comparable to Adam, sometimes even better, when it comes to the aforementioned properties. Finally, I discuss a (not so) similar modification to Simultaneous Perturbation Stochastic Approximation (SPSA) - a zeroth order optimization routine, recently used to train Large Language Models. We discuss its (local) optimality using another classic paper, this time by Odile Brandière.
Bio: Arun is an Associate Professor at the Dept. of Mathematics and Computer Science, Karlstad University, Sweden. He completed his Ph.D. from the Indian Institute of Science, India, working on Stochastic Approximation Algorithms and their link to Reinforcement Learning and Stochastic Optimization. He has won two awards and one fellowship for excellence and innovation in Artificial Intelligence. He enjoys working on problems that can be simply stated, and that require tools from Computer Science and Mathematics to solve them.