Asynchronous Methods for Deep Reinforcement Learning, Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, 2016ICML 2016DOI: 10.48550/arXiv.1602.01783 - Introduces A3C, a foundational asynchronous distributed RL algorithm that influenced many subsequent methods.
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures, Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu, 2018Proceedings of the 35th International Conference on Machine Learning (ICML), Vol. 80 (PMLR)DOI: 10.5555/3305890.3305963 - Presents the IMPALA architecture, which significantly improves the scalability and throughput of distributed RL using off-policy correction with V-trace.
Distributed Reinforcement Learning: A Survey, Xin Zhang, Bo Zhang, Hanjiang Hu, Chengdong Hu, Wei He, 2020Neurocomputing, Vol. 418 (Elsevier)DOI: 10.1016/j.neucom.2020.08.006 - Provides a comprehensive overview of distributed reinforcement learning approaches, covering different architectures, challenges, and future directions.