Skip to main content

IEOR e-seminar by Dr. Gugan Thoppe

Title: Improving Sample Efficiency in Evolutionary RL using Off-policy Ranking

Time and Date: 3pm, 11 February, Friday 

Speaker: Dr. Gugan Thoppe, Dept. of Computer Science and Automation, Indian Institute of Science

Abstract: Evolution strategies are powerful optimization techniques whose each iteration involves ranking candidate solutions based on a fitness score. When used in Reinforcement Learning (RL), this ranking step requires  evaluating multiple policies. This is presently done via on-policy methods, leading to increased environmental interactions. We propose a novel sample-efficient off-policy alternative. Our approach also uses a kernel approximation, making it directly applicable to deterministic policies. We demonstrate our ideas in the context of the Augmented Random Search (ARS) algorithm. Our simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% of the environmental interactions. It also outperforms the recent Trust Region Evolutionary Strategy.

We believe our ideas can be easily extended to other evolutionary methods. This is joint work with Eshwar S R and Shishir NY Kolathaya.

About the Speaker: Dr. Gugan Thoppe is an Asst. Professor at the Dept. of Computer Science and Automation, Indian Institute of Science . He has done two postdocs: one at Duke University, USA with Prof. Sayan Mukherjee, and the other at Technion, Israel, with Prof. Robert Adler. He has done his PhD and MS with Prof. Vivek Borkar at TIFR, India. His PhD work won the TAA-Sasken best thesis award for 2017. He is also a two-time recipient of the IBM PhD fellowship award (2013–14 and 2014-15). His research interests include stochastic approximation and random topology and their applications to reinforcement learning and data analysis, respectively. 

News Category
Date Posted